TLDR: A new framework called Networked Mixture-of-Experts (NMoE) is proposed to efficiently deploy large AI models on mobile edge devices. It splits the AI model across devices, allowing them to collaboratively infer and share computational resources. A three-stage federated learning approach trains the system, balancing personalization, generalization, privacy, and communication efficiency, making advanced AI accessible on resource-limited devices.
Large Artificial Intelligence Models (LAMs), like the ones powering advanced language and vision tools, are becoming increasingly powerful. However, deploying these massive models directly onto everyday mobile devices and edge computing systems presents a significant challenge. These devices often have limited storage, processing power, and battery life, making it difficult to run complex AI operations efficiently.
Traditional approaches to scaling LAMs often involve a concept called Mixture-of-Experts (MoE). In a standard MoE, a large model is broken down into smaller, specialized ‘expert’ subnetworks. When data comes in, a ‘gating network’ decides which few experts are most relevant to process that specific piece of data, activating only a subset of the model. This significantly reduces the computational load compared to running the entire large model.
However, even with MoE, existing methods often assume that a complete MoE structure can be deployed on each individual client device. This assumption doesn’t hold true for resource-constrained mobile edge devices, which simply can’t handle all expert networks simultaneously, especially during training.
Introducing Networked Mixture-of-Experts (NMoE)
To address this, researchers have introduced a novel framework called Networked Mixture-of-Experts (NMoE). This is the first system designed to split and distribute an MoE across multiple mobile edge devices within a communication network. Instead of each device hosting the entire MoE, an NMoE system allows clients to infer collaboratively by distributing tasks to suitable neighboring devices based on their specialized expertise.
In the NMoE setup, each client device locally deploys three key components: a cross-shared feature extractor, a cross-shared gating network, and a personalized expert. During the inference phase (when the system is making predictions), a client first processes its input data through the feature extractor to create a compact representation. This representation is then passed to the gating network, which intelligently determines the most suitable experts to handle the data – these could be the client’s own local expert or experts located on neighboring client devices. The data is then distributed, processed by the selected experts, and the results are aggregated and sent back to the originating client.
Smart Training for a Distributed System
Training such a distributed and collaborative system efficiently and privately is crucial. The NMoE framework proposes a three-stage federated learning approach:
- Stage 1: Feature Extractor Training: The shared feature extractor, which learns to create useful data representations, is trained using federated learning. This means multiple clients collaboratively train the model without sharing their raw data. Two methods are explored: FedCE, which uses a standard supervised learning approach, and FedSC, a self-supervised learning method that proves more robust for diverse and non-uniform data distributions.
- Stage 2: Personalized Expert Training: After the feature extractor is trained and ‘frozen’ (its parameters are fixed), each client independently trains its own personalized expert using its local, private dataset. This ensures that each expert is highly specialized and performs well on the client’s specific data, enhancing personalization and data privacy.
- Stage 3: Gating Network Training: Finally, the gating network, responsible for intelligently routing data to the correct experts, is trained. Several strategies are introduced: RanGate (random routing), RollGate (a local classifier that tries to identify suitable experts), and FedGate (a more general federated learning strategy that synchronizes gating network parameters across clients for aligned routing behavior without needing prior knowledge). FedGate generally offers superior performance in real-world scenarios.
Also Read:
- Automating Telecom Network Troubleshooting with AI Agents and Specialized Language Models
- Agentic AI’s Hidden Engine: The CPU’s Critical Role in Performance
Why NMoE Matters for the Future
The NMoE system offers a promising solution for the challenges of deploying large AI models in next-generation wireless networks and mobile edge computing. By distributing the computational load and leveraging collaborative intelligence, it allows resource-limited devices to participate in complex AI tasks. The federated learning approach ensures data privacy and communication efficiency, while the personalized experts adapt to diverse client data. This research provides valuable insights and benchmarks for training such systems, paving the way for more powerful and accessible AI on our mobile devices. You can read the full research paper here.


