TLDR: A new quantum federated learning (QFL) framework, mmQFL, is introduced to handle multiple data types (multimodal data) and missing information. It uses quantum entanglement for data fusion and a “Missing Modality Agnostic” (MMA) mechanism to maintain performance when data is incomplete. Simulations show mmQFL significantly outperforms existing methods in accuracy and robustness, making QFL more applicable to real-world, diverse datasets.
Quantum Federated Learning (QFL) is an exciting new field that combines the power of quantum computing with federated learning. Federated learning is a privacy-preserving method where multiple clients train a machine learning model locally and only share model parameters with a central server, keeping their raw data private. Quantum computing, with its unique principles like superposition and entanglement, offers the potential to accelerate complex machine learning tasks that classical computers struggle with.
However, current QFL frameworks primarily focus on single types of data, known as unimodal systems. This limits their use in real-world applications, which often involve multiple types of data, such as images, audio, and text, all at once. Another significant challenge in multimodal learning is dealing with missing data—when some sensors or data sources are unavailable, which can severely impact model performance.
To address these critical gaps, researchers have introduced a groundbreaking new framework called multimodal Quantum Federated Learning (mmQFL). This is the first time a multimodal approach has been specifically designed for the QFL setting. A key innovation in mmQFL is its use of an intermediate fusion technique that leverages quantum entanglement to combine information from different data types.
Furthermore, mmQFL introduces a clever mechanism called Missing Modality Agnostic (MMA). This mechanism tackles the problem of missing data by isolating untrained quantum circuits. When a modality is missing, the MMA ensures that the corresponding quantum bits remain in a fixed, known state, preventing corrupted data from degrading the overall model’s performance and ensuring stable training.
How mmQFL Works
In the mmQFL framework, a network of quantum processors, acting as clients, work together with a central quantum server to train a shared multimodal quantum machine learning model. Each client has a dataset with multiple modalities, and each modality is processed by its own dedicated Quantum Neural Network (QNN).
The unique features of each modality are encoded into quantum states. An intermediate quantum fusion layer, which includes trainable parameters, then entangles these separate modality-specific quantum states. This entanglement allows for deep cross-modal correlations directly within the quantum state space. After this fusion, measurements are taken to produce classical outputs, which are used for classification and calculating the model’s error.
When a modality is missing, the MMA mechanism comes into play. It uses a “context vector” to track data availability. If a modality is absent, its corresponding qubits are forced into a zero state using “no-op gates.” This prevents the missing data from introducing errors or “garbage values” that could corrupt the quantum states of other, properly trained modalities. After local training, clients send their updated models to the quantum server for aggregation, and the refined global model is then shared back with the clients.
Experimental Validation
The mmQFL framework was rigorously tested using the Carnegie Mellon University Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset, which includes synchronized text, audio, and image data. The simulations involved ten noisy intermediate-scale quantum (NISQ) devices for local data processing and a single quantum server.
Initial comparisons showed that quantum models (iQNN for images, aQNN for audio, tQNN for text) generally outperformed their classical counterparts (CNN for images, LSTM for audio and text) in emotion detection tasks, often converging faster or achieving better accuracy.
An ablation study revealed that model performance improved with more qubits, with 10 qubits yielding the best results. The optimal number of quantum layers varied by modality, suggesting that different data types might benefit from different circuit depths. Increasing the number of clients also generally improved accuracy, highlighting the scalability and robustness of the federated learning aspect. Furthermore, larger datasets consistently led to better model performance.
Crucially, the integration of multiple modalities in mmQFL consistently increased classification accuracy compared to single-modality approaches. The MMA mechanism proved highly effective in mitigating the negative impact of missing data. For instance, when 20% of text data was missing, MMA improved accuracy from 64.64% to 75.30% in an independent and identically distributed (IID) data setting.
In a final comparison against other state-of-the-art methods, including unimodal, classical multimodal, and other quantum federated learning approaches, mmQFL achieved the highest combined accuracy. With 10% of data missing from all modalities, mmQFL reached 85.61% accuracy in IID settings and 79.27% in non-IID conditions. This represents a significant improvement of 6.84% in IID and 7.25% in non-IID data distributions over existing methods.
Also Read:
- Advancing Multimodal AI: A New Model for Unified General and Spatial Understanding
- Protecting Autonomous AI Agents from User and Tool Threats
Conclusion
The mmQFL framework marks a significant step forward in quantum machine learning. By successfully integrating multiple data modalities and introducing a robust mechanism to handle missing data, it enhances the performance and stability of QML models in distributed, privacy-preserving environments. This research paves the way for more practical and resilient quantum machine learning applications in the real world. You can read the full research paper here.


