Advancing Climate AI with Chinese Hardware: A Deep Dive into Model Migration and Optimization

TLDR: Researchers developed a framework to migrate large-scale atmospheric and oceanic AI models from PyTorch to MindSpore, optimizing them for Chinese hardware like Ascend and DCU chips. The study found that these models maintained their accuracy while achieving competitive performance and superior energy efficiency compared to traditional GPUs, paving the way for greater technological independence in scientific computing.

The field of artificial intelligence is rapidly transforming climate and weather research, enabling more efficient model training and inference. However, many advanced models, such as FourCastNet and AI-GOMS, traditionally rely heavily on GPUs, which can limit hardware independence, particularly for domestic Chinese hardware and software frameworks.

A recent study introduces a comprehensive framework designed to address this challenge. It focuses on migrating large-scale atmospheric and oceanic AI models from the widely used PyTorch framework to MindSpore, a prominent Chinese deep learning framework. The goal is to optimize these models for Chinese chips, including Huawei’s Ascend and Sugon’s Deep Computing Unit (DCU), and then rigorously evaluate their performance against GPU-based systems.

Migration and Optimization Strategies

The framework tackles several key areas: software-hardware adaptation, memory optimization, and parallelism. The migration process from PyTorch to MindSpore is critical, as PyTorch uses a dynamic graph mechanism (allowing flexible model structure changes during runtime), while MindSpore employs a static graph mechanism (requiring predefined structures). This necessitates a complete redesign of the model’s logic in MindSpore, along with explicit declaration of input dimensions.

Operator adaptation is another significant aspect. MindSpore’s operator library doesn’t always fully cover PyTorch’s. To overcome this, the researchers prioritized using equivalent MindSpore operators, developed custom operators for unsupported functions, or restructured computational logic using low-level APIs. Furthermore, the framework leverages MindSpore-specific features like mixed precision training (reducing memory usage by using 16-bit floating-point numbers), built-in distributed computation support (splitting models across multiple chips), and graph mode optimizations (improving training speed by optimizing the computational graph during compilation).

For hardware adaptation, the team designed targeted optimization strategies to fully exploit the capabilities of Chinese chips. For instance, the Ascend 910b chip utilizes its built-in hardware accelerators, such as matrix computation units, to optimize operations at the operator level. Distributed training is also implemented to handle large models that exceed the capacity of a single chip. Memory management improvements include mixed precision training and pipelined execution on Ascend to avoid memory peaks.

Performance Evaluation

The study evaluated model performance across multiple metrics: training efficiency (time per epoch and total training duration), inference efficiency (single-chip inference time), model accuracy (using RMSE and ACC), and energy efficiency. The experiments involved three representative models: FourCastNet (a weather forecasting model based on Adaptive Fourier Neural Operators), GraphCast (a Graph Neural Network-based weather model), and AI-GOMS (the first large-scale oceanic model).

The hardware configurations included the Huawei Ascend 910b, Sugon DCU Z100L, and for comparison, NVIDIA A100 and NVIDIA 3090 GPUs. Datasets used were ERA5 for atmospheric data and HYCOM for oceanic parameters.

Key Findings

Experimental results demonstrated that the migration and optimization process successfully preserved the models’ original accuracy, with deviations generally under 5%. In terms of training efficiency, the Ascend 910b platform running on PyTorch achieved training times nearly identical to the A100. When migrated to MindSpore, Ascend 910b showed even higher efficiency, with total training time dropping by approximately 10% compared to its PyTorch counterpart, thanks to MindSpore’s optimization features and distributed training capabilities. The DCU platform had longer training times but showed significant scalability potential with multi-device parallel training.

Training accuracy analysis revealed consistent loss function declines across all platforms, indicating that the migrated models maintained stable training performance. The MindSpore framework on Ascend 910b exhibited smoother loss curves, suggesting improved stability and efficiency through static graph optimization and mixed precision training.

For inference efficiency, Ascend 910b’s single-step inference time was comparable to A100. Notably, MindSpore on Ascend 910b improved inference speed compared to the PyTorch version. While DCU showed slightly higher inference times, especially for complex models like GraphCast, the Chinese chips demonstrated competitiveness in many tasks.

A significant advantage of the Chinese chips was their energy efficiency. During training, Ascend 910b showed superior energy efficiency, with an average power consumption increase of about 15% while maintaining comparable performance to GPUs. The DCU platform exhibited even lower power consumption, achieving an energy efficiency ratio 1.3 times higher than GPUs in the inference phase, making it ideal for long-duration inference tasks. Overall, Chinese chips demonstrated outstanding energy efficiency, particularly in inference scenarios.

Also Read:

Conclusion and Future Outlook

This research confirms the viability and benefits of deploying large-scale atmospheric and oceanic AI models on Chinese hardware platforms. The migrated models maintain accuracy and achieve competitive computational performance, with Ascend 910b excelling in distributed training and energy efficiency, and DCU showing promise for energy-efficient inference. This work provides valuable insights and practical guidance for leveraging Chinese domestic chips and frameworks, offering a pathway toward greater technological independence in scientific computing. For more details, you can refer to the full research paper here.

Future efforts will focus on enhancing MindSpore’s operator library, refining distributed training efficiency, and exploring hardware-software co-design strategies tailored for specific meteorological and oceanographic applications. Fostering a robust ecosystem with optimized toolchains and open-source collaboration will be crucial for broader adoption of these platforms.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Climate AI with Chinese Hardware: A Deep Dive into Model Migration and Optimization

Migration and Optimization Strategies

Performance Evaluation

Key Findings

Conclusion and Future Outlook

Gen AI News and Updates

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

Gabriel Marketing Group Introduces Generative Engine Optimization (GEO) Content Services for B2B Technology Companies Amidst AI Evolution

OpenAI Unveils ‘Friendlier’ GPT-5.1 for ChatGPT, Emphasizing Enhanced User Experience and Adaptive Intelligence

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates