Evo-Merging: Combining Black-Box Language Models for Enhanced AI Services

TLDR: Evo-Merging is a new framework that allows for the merging of multiple large language models (LLMs) even when their internal parameters are inaccessible, a common scenario for ‘Language-Model-as-a-Service’ offerings. It uses a derivative-free evolutionary algorithm with two key components: sparsity-based denoising to filter irrelevant information and sign-aware scaling to dynamically weigh model contributions and resolve conflicts. This approach achieves state-of-the-art performance, demonstrating robustness and scalability in merging over 100 models, significantly outperforming existing methods that require parameter access.

In the rapidly evolving world of artificial intelligence, combining the strengths of multiple language models into a single, more powerful unit is a highly sought-after capability. This process, known as model merging or model fusion, offers an efficient way to enhance AI models without the need for extensive retraining or data collection. Traditionally, these techniques have relied on direct access to a model’s internal parameters, allowing researchers to manipulate and integrate their ‘knowledge’ effectively.

However, the landscape of large language models (LLMs) is shifting. Many advanced LLMs, such as GPT-4, are increasingly offered as black-box services through API interfaces. This means that end-users and developers can utilize these powerful models, but they cannot access or modify their underlying parameters. This presents a significant challenge for traditional model merging techniques, a problem the researchers refer to as black-box model merging (BMM).

Introducing Evo-Merging: A Solution for Black-Box Model Merging

To tackle this challenge, a team of researchers has proposed a novel framework called Evo-Merging. This innovative approach enables effective model merging even when model parameters are inaccessible, relying solely on inference-time API queries. Evo-Merging is built on a derivative-free optimization framework, inspired by evolutionary algorithms, making it particularly well-suited for scenarios where direct parameter access is not possible and user data privacy is paramount.

The Evo-Merging method consists of two crucial components that work in tandem:

Sparsity-Based Denoising: This initial stage is designed to identify and filter out irrelevant or redundant information across the various models being merged. Not all parameters from every model contribute meaningfully to a specific task, and some might even introduce noise. By selectively retaining only the most significant parameters, this module ensures a cleaner, more focused knowledge base for the merged model. The research provides a formal justification for why pruning the ‘LoRa-A’ matrix (a component in Low-Rank Adaptation) is a more effective and safer strategy for denoising than pruning other parts of the model.
Sign-Aware Scaling: After denoising, this stage dynamically computes optimal combination weights for the relevant models based on their performance. A key innovation here is the use of both positive and negative weights. Positive weights enhance synergy between models, while negative weights can actively subtract conflicting knowledge. This dynamic scaling mechanism is vital for resolving conflicts and unlocking cross-domain knowledge, leading to better generalization, especially for new, unseen tasks.

How Evo-Merging Works Without Direct Access

Instead of manipulating internal parameters, Evo-Merging leverages an evolutionary algorithm, specifically the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). This algorithm optimizes the merging parameters (sparsity ratios and scaling weights) based on feedback received from API queries. Essentially, it iteratively tests different combinations of model contributions and learns which configurations yield the best performance on a given validation set, all without ever needing to see the models’ internal workings.

Impressive Results and Robustness

Extensive experimental evaluations demonstrate that Evo-Merging achieves state-of-the-art results across a range of tasks, significantly outperforming existing strong baselines. Notably, it shows remarkable performance in both out-of-domain (unseen tasks) and in-domain scenarios. For instance, in out-of-domain tasks, Evo-Merging achieved over a 10-point improvement in both precision and F1 score compared to the best baseline, even though baselines had access to model parameters. It also proved highly effective at integrating knowledge from over 100 models, a scenario where many traditional methods suffer performance degradation due to increased noise and conflicts.

The framework also exhibits exceptional robustness against cross-task/domain interference. When faced with noisy merging environments containing irrelevant models, conventional methods experienced catastrophic performance drops, whereas Evo-Merging maintained high performance. Furthermore, it demonstrates high data efficiency, achieving strong results with relatively small validation data sample sizes.

Also Read:

Looking Ahead

The introduction of black-box model merging and the Evo-Merging framework marks a significant step forward in making advanced AI capabilities more accessible and adaptable, especially in the context of Language-Model-as-a-Service. The researchers plan to generalize Evo-Merging to merge black-box multi-modal models in the future, designing evolutionary search strategies over cross-modal representations. This work opens new avenues for leveraging massive model repositories efficiently and effectively, even when their internal mechanisms remain hidden. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Evo-Merging: Combining Black-Box Language Models for Enhanced AI Services

Introducing Evo-Merging: A Solution for Black-Box Model Merging

How Evo-Merging Works Without Direct Access

Impressive Results and Robustness

Looking Ahead

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates