TLDR: FedS2R is a novel one-shot federated domain generalization framework for synthetic-to-real semantic segmentation in autonomous driving. It addresses data privacy and the domain gap by using inconsistency-driven data augmentation with diffusion models and a multi-client knowledge distillation scheme with feature fusion. Experiments show that FedS2R’s global model significantly outperforms individual client models and other federated baselines on real-world datasets, achieving performance close to a model trained with full data access, all without sharing raw client data or requiring server-side annotations.
In the rapidly evolving world of autonomous driving, accurate perception of the environment is paramount. Semantic segmentation, a key technology, allows self-driving cars to understand road scenes pixel by pixel, distinguishing between objects like vehicles, pedestrians, and infrastructure. However, training these sophisticated models typically demands vast datasets with meticulous, pixel-level human annotations, a process that is both costly and time-consuming.
To circumvent the laborious manual annotation, many researchers turn to synthetic datasets generated by computers. These synthetic environments can automatically provide pixel-level labels. Yet, a significant challenge arises: the ‘domain gap’ between synthetic and real-world data. Models trained solely on synthetic data often struggle to generalize effectively when deployed in real-world scenarios, limiting their practical application.
Adding to this complexity are the stringent data privacy and intellectual property concerns. Synthetic datasets, often developed by companies or academic institutions, come with strict licensing agreements that prohibit redistribution. Furthermore, real-world driving data, which might contain identifiable geographic features or sensor-specific information, raises substantial privacy issues if shared without proper control. This means that unrestricted data sharing, especially across different divisions of a global autonomous driving company, is often impractical.
To address these intertwined issues of domain gap and data privacy, a new research paradigm called federated domain generalization has emerged. This approach combines federated learning with domain generalization. In a federated learning setup, data remains local to individual clients (e.g., different companies or regional divisions), and only model weights or gradients are shared with a central server. This preserves data privacy while still enabling collaborative model training.
While federated domain generalization has shown promise in image classification, its application to semantic segmentation in autonomous driving has remained largely unexplored. Moreover, many existing federated learning methods require multiple rounds of communication and active participation from clients, which can be impractical in real-world scenarios where clients might only share their models once.
Introducing FedS2R: A One-Shot Solution
A recent research paper, FedS2R: One-Shot Federated Domain Generalization for Synthetic-to-Real Semantic Segmentation in Autonomous Driving, proposes a novel framework to tackle these challenges. FedS2R is the first one-shot federated domain generalization framework specifically designed for synthetic-to-real semantic segmentation in autonomous driving. ‘One-shot’ means it requires only a single round of communication between clients and the server, making it highly practical for deployment.
FedS2R operates in two main stages:
1. Inconsistency-driven Data Augmentation: Client models, trained on their private synthetic datasets, often show inconsistent predictions on the same real-world images, especially for less common or ‘unstable’ classes (like trains or motorcycles). To address this, FedS2R quantifies this inconsistency. For classes where client models disagree significantly, it uses a large language model (like ChatGPT) to generate descriptive prompts. These prompts are then fed into a pre-trained diffusion model (like Stable Diffusion XL) to synthesize new, photorealistic images containing these unstable classes. These newly generated images are added to the server’s dataset, enhancing the representation of challenging classes without needing any human annotations.
2. Multi-client Knowledge Distillation with Feature Fusion: In this stage, FedS2R distills the knowledge from multiple client models into a single, robust global model. The server receives the trained models from clients but never accesses their raw data. Instead, it uses its own (unannotated) dataset, augmented with the newly generated images, to train the global model. The client models’ internal features are averaged, and their classification and mask predictions are combined. The global model then learns to mimic these combined ‘soft predictions’ from the client models using a process called knowledge distillation. This involves a combination of Kullback-Leibler (KL) divergence for classification and a mix of Binary Cross-Entropy (BCE) and Dice loss for mask prediction, ensuring the global model captures both class-level knowledge and accurate object shapes.
Also Read:
- Bridging Domain Gaps: A New Sampling Approach for Partial Domain Adaptation
- Flow-SSNs: Advancing Medical Image Segmentation with Enhanced Uncertainty Modeling
Experimental Success
The effectiveness of FedS2R was rigorously tested on five real-world datasets: Cityscapes, BDD100K, Mapillary, IDD, and ACDC. These datasets represent diverse driving conditions and scenarios. The results were compelling: the global model trained with FedS2R consistently outperformed individual client models and was only marginally behind a theoretical ‘upper-bound’ model that had simultaneous access to all client data. For instance, in one configuration, FedS2R achieved a mean Intersection over Union (mIoU) of 58.5, significantly better than the baseline federated learning approach (FedAvg) and individual client models.
The ablation studies further confirmed the importance of each component of FedS2R. Both the inconsistency-driven data augmentation and the feature fusion mechanism contributed meaningfully to the overall performance, demonstrating that their combined effect leads to superior generalization across different and challenging real-world driving environments.
In conclusion, FedS2R represents a significant step forward in applying federated learning to semantic segmentation for autonomous driving. By enabling collaborative training without compromising data privacy and effectively bridging the synthetic-to-real domain gap in a one-shot manner, it offers a practical and powerful solution for developing more robust and generalizable perception systems for self-driving vehicles.


