MU-SplitFed: Accelerating Split Federated Learning by Decoupling Training from Straggler Delays

TLDR: MU-SplitFed is a new algorithm for Split Federated Learning (SFL) that addresses the ‘straggler issue’ – where slow devices delay the entire training process. It uses an ‘unbalanced update’ mechanism, allowing the server to perform multiple local updates for each client communication round, and incorporates Zeroth-Order (ZO) optimization on clients to reduce memory. This approach significantly reduces communication rounds, makes training time independent of straggler delays, and improves memory efficiency, especially for large language models, outperforming existing methods in heterogeneous environments.

Split Federated Learning (SFL) is a powerful approach that combines the benefits of Federated Learning (FL) and Split Learning (SL) to train large models efficiently on edge devices. It allows for scalable training by distributing parts of a neural network across multiple clients and a central server. While FL enables parallel updates from many devices, it can be computationally heavy for individual edge devices. SL, on the other hand, offloads much of the computation to the server, reducing the burden on clients but often leading to high latency due to its sequential nature. SFL aims to strike a balance, making it a promising framework as models continue to grow in size.

However, SFL faces a significant challenge known as the ‘straggler issue.’ In distributed systems, stragglers are clients with the slowest computation or communication speeds, and they can severely delay the entire training process. This problem is particularly acute in SFL because the server’s model updates are dependent on receiving information (like activations) from all clients. This tight synchronization requirement means that everyone has to wait for the slowest participant, creating a critical bottleneck for the system’s scalability and efficiency. Existing solutions often fall short, either by requiring specific model architectures that aren’t always available (like in modern transformer models) or by introducing asynchronous updates that can worsen performance under diverse data conditions.

To tackle this persistent problem, researchers have introduced a novel algorithm called MU-SplitFed. This approach is designed to be resilient to stragglers by fundamentally changing how the server and clients update their models. MU-SplitFed uses a clever ‘unbalanced update’ mechanism, which allows the Split Server to perform multiple local optimization steps (τ updates) for every single communication round with the clients. This effectively decouples the training progress from the delays caused by slow clients, making the server more productive instead of waiting idly.

A key innovation in MU-SplitFed is the incorporation of Zeroth-Order (ZO) optimization on the client side. ZO optimization is a gradient-free method that significantly reduces the memory and computational demands on edge devices because it doesn’t require complex backpropagation. This makes it ideal for resource-constrained environments. The overall training process involves two main phases: first, clients and the Split Server engage in unbalanced ZO updates, where clients send perturbed embeddings, and the server performs its multiple local updates. Second, a central Fed Server aggregates the updated client-side models, and the Split Server aggregates its server-side models to form a new global model.

The theoretical analysis of MU-SplitFed demonstrates a linear speedup in communication rounds, meaning that increasing the server’s update frequency (τ) directly accelerates convergence. Crucially, the total training time can become independent of the straggler’s delay if τ is appropriately chosen. This is a major breakthrough for SFL systems. The research also highlights an important connection between how the model is split (the ‘cut layer’) and the optimal unbalanced update ratio (τ). Aligning these two factors is essential for achieving the best convergence, with deeper server-side models benefiting from larger τ values.

Experimental results on various benchmark datasets, including CIFAR-10, Fashion-MNIST, CINIC-10, and CIFAR-100, consistently show that MU-SplitFed outperforms baseline methods like vanilla SplitFed and GAS (a recent asynchronous SFL method) in the presence of stragglers. It achieves higher accuracy in less wall-clock time, demonstrating its practical effectiveness. Furthermore, when fine-tuning a large language model (OPT-1.3B) on the SST-2 dataset, MU-SplitFed significantly reduces client-side memory usage to just 1.05 GB, compared to 8.02 GB for FedAvg and 5.64 GB for FedLoRA. This memory efficiency, combined with straggler resilience, makes MU-SplitFed particularly promising for fine-tuning large language models on edge devices.

Also Read:

In conclusion, MU-SplitFed offers a simple yet highly effective solution to the long-standing straggler problem in Split Federated Learning. By leveraging unbalanced server-side updates and zeroth-order optimization, it reduces communication complexity, accelerates training, and makes the overall process independent of the slowest client’s speed. This framework not only improves efficiency and scalability for traditional SFL applications but also opens new avenues for fine-tuning large language models on resource-constrained edge devices. For more details, you can refer to the full research paper: Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

MU-SplitFed: Accelerating Split Federated Learning by Decoupling Training from Straggler Delays

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates