TLDR: MU-SplitFed is a new algorithm for Split Federated Learning (SFL) that addresses the ‘straggler issue’ – where slow devices delay the entire training process. It uses an ‘unbalanced update’ mechanism, allowing the server to perform multiple local updates for each client communication round, and incorporates Zeroth-Order (ZO) optimization on clients to reduce memory. This approach significantly reduces communication rounds, makes training time independent of straggler delays, and improves memory efficiency, especially for large language models, outperforming existing methods in heterogeneous environments.
Split Federated Learning (SFL) is a powerful approach that combines the benefits of Federated Learning (FL) and Split Learning (SL) to train large models efficiently on edge devices. It allows for scalable training by distributing parts of a neural network across multiple clients and a central server. While FL enables parallel updates from many devices, it can be computationally heavy for individual edge devices. SL, on the other hand, offloads much of the computation to the server, reducing the burden on clients but often leading to high latency due to its sequential nature. SFL aims to strike a balance, making it a promising framework as models continue to grow in size.
However, SFL faces a significant challenge known as the ‘straggler issue.’ In distributed systems, stragglers are clients with the slowest computation or communication speeds, and they can severely delay the entire training process. This problem is particularly acute in SFL because the server’s model updates are dependent on receiving information (like activations) from all clients. This tight synchronization requirement means that everyone has to wait for the slowest participant, creating a critical bottleneck for the system’s scalability and efficiency. Existing solutions often fall short, either by requiring specific model architectures that aren’t always available (like in modern transformer models) or by introducing asynchronous updates that can worsen performance under diverse data conditions.
To tackle this persistent problem, researchers have introduced a novel algorithm called MU-SplitFed. This approach is designed to be resilient to stragglers by fundamentally changing how the server and clients update their models. MU-SplitFed uses a clever ‘unbalanced update’ mechanism, which allows the Split Server to perform multiple local optimization steps (τ updates) for every single communication round with the clients. This effectively decouples the training progress from the delays caused by slow clients, making the server more productive instead of waiting idly.
A key innovation in MU-SplitFed is the incorporation of Zeroth-Order (ZO) optimization on the client side. ZO optimization is a gradient-free method that significantly reduces the memory and computational demands on edge devices because it doesn’t require complex backpropagation. This makes it ideal for resource-constrained environments. The overall training process involves two main phases: first, clients and the Split Server engage in unbalanced ZO updates, where clients send perturbed embeddings, and the server performs its multiple local updates. Second, a central Fed Server aggregates the updated client-side models, and the Split Server aggregates its server-side models to form a new global model.
The theoretical analysis of MU-SplitFed demonstrates a linear speedup in communication rounds, meaning that increasing the server’s update frequency (τ) directly accelerates convergence. Crucially, the total training time can become independent of the straggler’s delay if τ is appropriately chosen. This is a major breakthrough for SFL systems. The research also highlights an important connection between how the model is split (the ‘cut layer’) and the optimal unbalanced update ratio (τ). Aligning these two factors is essential for achieving the best convergence, with deeper server-side models benefiting from larger τ values.
Experimental results on various benchmark datasets, including CIFAR-10, Fashion-MNIST, CINIC-10, and CIFAR-100, consistently show that MU-SplitFed outperforms baseline methods like vanilla SplitFed and GAS (a recent asynchronous SFL method) in the presence of stragglers. It achieves higher accuracy in less wall-clock time, demonstrating its practical effectiveness. Furthermore, when fine-tuning a large language model (OPT-1.3B) on the SST-2 dataset, MU-SplitFed significantly reduces client-side memory usage to just 1.05 GB, compared to 8.02 GB for FedAvg and 5.64 GB for FedLoRA. This memory efficiency, combined with straggler resilience, makes MU-SplitFed particularly promising for fine-tuning large language models on edge devices.
Also Read:
- Unlocking Low-Precision Training: A New Theory for Adaptive Optimizer Convergence
- Boosting LLM Efficiency: How Token Permutation Makes Attention Sparser
In conclusion, MU-SplitFed offers a simple yet highly effective solution to the long-standing straggler problem in Split Federated Learning. By leveraging unbalanced server-side updates and zeroth-order optimization, it reduces communication complexity, accelerates training, and makes the overall process independent of the slowest client’s speed. This framework not only improves efficiency and scalability for traditional SFL applications but also opens new avenues for fine-tuning large language models on resource-constrained edge devices. For more details, you can refer to the full research paper: Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach.


