Mixture of Weather Experts: A New Approach to Enhance Forecast Accuracy

TLDR: The MoWE (Mixture of Weather Experts) framework combines outputs from multiple existing AI weather models using a Vision Transformer-based gating network. This approach dynamically weights expert contributions based on lead time and location, achieving up to 10% lower RMSE than the best individual AI model for 2-day forecasts. It’s computationally efficient and offers a scalable way to improve weather prediction by leveraging the strengths of diverse models.

In the evolving landscape of weather prediction, data-driven models have made significant strides, yet recent progress has shown signs of plateauing. To overcome these limitations, researchers have introduced a novel approach called the Mixture of Experts (MoWE).

Instead of developing entirely new forecasting models, MoWE focuses on optimally combining the outputs of existing, high-performing models. This strategy allows for enhanced accuracy with significantly lower computational resources compared to training individual expert models from scratch.

At the heart of the MoWE system is a Vision Transformer-based gating network. This intelligent network dynamically learns to assign weights to the contributions of multiple “expert” models at each specific grid point, adjusting these weights based on the forecast lead time. The result is a synthesized deterministic forecast that consistently outperforms any single component model in terms of Root Mean Squared Error (RMSE).

The effectiveness of MoWE is striking: it has achieved up to a 10% lower RMSE than the best-performing AI weather model for a 2-day forecast horizon. This represents a substantial improvement over individual experts and even a simple average of their predictions. The framework offers a computationally efficient and scalable method to advance the state of the art in data-driven weather prediction by maximizing the utility of leading forecast models.

The paper details the methodology, explaining that the MoWE model produces a superior forecast by dynamically weighting the contributions of pre-existing expert models. The core gating network, a deep neural network, determines these optimal weights by considering all expert forecasts, forecast lead time, and an optional noise vector for probabilistic variants. The architecture leverages Vision Transformer blocks, processing a composite image of stacked forecast maps from experts, then outputting pixel-by-pixel weight maps for each expert and a final bias map.

For this preliminary study, three expert models were chosen: Pangu, Aurora, and FCN3. Pangu utilizes a 3D data cube approach to capture complex weather patterns. Aurora, built on a Swin Transformer, processes diverse atmospheric data through pretraining and fine-tuning. FCN3 is a probabilistic model using a spherical neural operator, designed to minimize the Continuous Ranked Probability Score (CRPS), and while its single-member deterministic scores might lag, its ensemble performance is competitive.

The MoWE model was trained using 2-day forecast trajectories generated by each expert model, initialized at various timesteps of ERA5 data from 1980 to 2014. The training objective was to minimize the Mean Squared Error (MSE) between its prediction and the ground truth. Testing was conducted using data from 2015.

Results demonstrate that MoWE consistently achieves the lowest RMSE across all evaluated atmospheric variables and lead times, from 6 hours up to 2 days. Interestingly, while individual experts perform better at shorter lead times, the simple mean of experts can become superior at longer lead times (1-2 days) due to error reduction through averaging. MoWE, however, surpasses both the best individual expert and the simple mean across all scenarios.

An ablation study on model capacity showed that a Base model (25 million parameters) performed marginally better than a Small model (9 million parameters), highlighting the efficiency of the MoWE framework even with lightweight designs. Qualitative analysis of forecasts showed consistency with baseline models, and the learned weights dynamically adjusted based on lead time, channels, and spatial locations. For instance, at a 6-hour forecast, MoWE heavily favored the Aurora model, but as the forecast extended to 24 and 48 hours, weights were distributed more evenly among FCN3, Aurora, and Pangu, often influenced by geographical features.

In conclusion, the MoWE framework offers a strategic and effective alternative to developing new standalone models, leveraging the collective strengths of existing expert models to significantly improve forecast skill. This approach demonstrates that valuable, complementary information is distributed across different models and can be harnessed effectively. The superiority of MoWE over simpler ensembling strategies also indicates its ability to isolate advantages of different experts to specific locations and lead times.

Also Read:

While the current approach has limitations, such as fixed rollout times and the increasing infeasibility of simple channel concatenation with more experts, future work aims to address these through online training setups and dimensionality reduction strategies. This research paves the way for a shift from competing models to collaborative models, fostering community effort in the next generation of weather forecasting systems. You can find the full research paper here: MOWE : A Mixture of Weather Experts.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Mixture of Weather Experts: A New Approach to Enhance Forecast Accuracy

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates