FLEX OLMO: Enabling Private and Flexible Language Model Training

TLDR: FLEX OLMO is a new language model architecture that allows distributed training on private, unshared datasets using a Mixture-of-Experts (MoE) approach. Each expert is trained independently with a public model as an anchor, and a novel domain-informed router integrates them without joint training. This enables flexible data inclusion/exclusion during inference, offering significant performance gains (41% average improvement) over public-only models and existing merging methods, while respecting data privacy.

In the evolving landscape of artificial intelligence, a new class of language models, known as FLEX OLMO, is emerging to address critical challenges related to data privacy and flexible data use. This innovative approach allows for the training of large language models without the need for data sharing, a significant hurdle for organizations dealing with sensitive or proprietary information.

Traditional language model training typically requires all data to be centrally aggregated, making it difficult for entities like healthcare institutions or financial firms to leverage their valuable, domain-specific datasets due to regulations such as HIPAA and GDPR, or intellectual property concerns. FLEX OLMO offers a solution by enabling distributed training where different parts of the model, called “experts,” are independently trained on closed datasets that never leave their owners’ local environments.

At its core, FLEX OLMO utilizes a Mixture-of-Experts (MoE) architecture. Unlike standard MoE models where experts are trained jointly on a combined dataset, FLEX OLMO’s experts are trained asynchronously and independently. Each data owner trains their own expert module using a shared “public model” as an anchor point. This unique training algorithm ensures that these independently trained experts can coordinate effectively when integrated into the larger model, even without ever seeing each other’s data.

A key innovation is the “domain-informed router.” In typical MoE setups, the router, which decides which expert processes each piece of information, requires joint training on all data. FLEX OLMO bypasses this by assigning each expert a unique router embedding, initialized from its domain and fine-tuned during individual expert training. These embeddings are then simply combined to form the complete router during inference, eliminating the need for shared data during this crucial step.

The flexibility extends to inference as well. FLEX OLMO supports “data-flexible inference,” meaning that these expert parameters and their associated data can be included or excluded from model inferences without any further training. This provides strong guarantees for data opt-out, allowing users to remove the influence of certain data based on licensing or permission requirements. For example, if a user doesn’t have rights to data from a specific domain, that expert can be easily deactivated.

To validate their approach, the researchers curated a dataset called FLEX MIX, which includes a public training set and seven simulated closed domain-specific sets, such as news, academic papers, code, and even Reddit content that is no longer publicly available. Evaluations on 31 diverse downstream tasks showed impressive results. FLEX OLMO demonstrated an average 41% relative improvement over models trained solely on public data. It also outperformed prior model merging techniques by 10.1% on average and even surpassed standard MoE models trained without data restrictions, using the same computational resources.

The model’s behavior analysis revealed that the router effectively activates the most relevant domain expert for specific inputs, while also frequently engaging the public expert, highlighting the synergistic design. The performance stabilizes after activating a few experts, suggesting efficient operation. Furthermore, the data opt-out mechanism was empirically validated, showing that removing an expert primarily impacts performance on its specific domain, with minimal effect on unrelated tasks.

Addressing concerns about data extraction, the research paper also includes an empirical assessment. While a small, non-zero fraction of data might be extractable from models that include weights trained on specific data, the overall extraction rate remains low. For highly sensitive data, the authors recommend using differentially private (DP) learning methods during expert training, which can be applied independently by each data owner.

Also Read:

FLEX OLMO represents a significant step forward for organizations in regulated industries or those with sensitive data, enabling them to benefit from closed datasets while maintaining strict control over their information. This work paves the way for broader collaboration in AI research by allowing contributions without compromising data privacy. You can find the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

FLEX OLMO: Enabling Private and Flexible Language Model Training

Gen AI News and Updates

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vatican Summit Addresses Ethical Imperatives of AI in Healthcare

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates