TLDR: Researchers developed a framework to migrate large-scale atmospheric and oceanic AI models from PyTorch to MindSpore, optimizing them for Chinese hardware like Ascend and DCU chips. The study found that these models maintained their accuracy while achieving competitive performance and superior energy efficiency compared to traditional GPUs, paving the way for greater technological independence in scientific computing.
The field of artificial intelligence is rapidly transforming climate and weather research, enabling more efficient model training and inference. However, many advanced models, such as FourCastNet and AI-GOMS, traditionally rely heavily on GPUs, which can limit hardware independence, particularly for domestic Chinese hardware and software frameworks.
A recent study introduces a comprehensive framework designed to address this challenge. It focuses on migrating large-scale atmospheric and oceanic AI models from the widely used PyTorch framework to MindSpore, a prominent Chinese deep learning framework. The goal is to optimize these models for Chinese chips, including Huawei’s Ascend and Sugon’s Deep Computing Unit (DCU), and then rigorously evaluate their performance against GPU-based systems.
Migration and Optimization Strategies
The framework tackles several key areas: software-hardware adaptation, memory optimization, and parallelism. The migration process from PyTorch to MindSpore is critical, as PyTorch uses a dynamic graph mechanism (allowing flexible model structure changes during runtime), while MindSpore employs a static graph mechanism (requiring predefined structures). This necessitates a complete redesign of the model’s logic in MindSpore, along with explicit declaration of input dimensions.
Operator adaptation is another significant aspect. MindSpore’s operator library doesn’t always fully cover PyTorch’s. To overcome this, the researchers prioritized using equivalent MindSpore operators, developed custom operators for unsupported functions, or restructured computational logic using low-level APIs. Furthermore, the framework leverages MindSpore-specific features like mixed precision training (reducing memory usage by using 16-bit floating-point numbers), built-in distributed computation support (splitting models across multiple chips), and graph mode optimizations (improving training speed by optimizing the computational graph during compilation).
For hardware adaptation, the team designed targeted optimization strategies to fully exploit the capabilities of Chinese chips. For instance, the Ascend 910b chip utilizes its built-in hardware accelerators, such as matrix computation units, to optimize operations at the operator level. Distributed training is also implemented to handle large models that exceed the capacity of a single chip. Memory management improvements include mixed precision training and pipelined execution on Ascend to avoid memory peaks.
Performance Evaluation
The study evaluated model performance across multiple metrics: training efficiency (time per epoch and total training duration), inference efficiency (single-chip inference time), model accuracy (using RMSE and ACC), and energy efficiency. The experiments involved three representative models: FourCastNet (a weather forecasting model based on Adaptive Fourier Neural Operators), GraphCast (a Graph Neural Network-based weather model), and AI-GOMS (the first large-scale oceanic model).
The hardware configurations included the Huawei Ascend 910b, Sugon DCU Z100L, and for comparison, NVIDIA A100 and NVIDIA 3090 GPUs. Datasets used were ERA5 for atmospheric data and HYCOM for oceanic parameters.
Key Findings
Experimental results demonstrated that the migration and optimization process successfully preserved the models’ original accuracy, with deviations generally under 5%. In terms of training efficiency, the Ascend 910b platform running on PyTorch achieved training times nearly identical to the A100. When migrated to MindSpore, Ascend 910b showed even higher efficiency, with total training time dropping by approximately 10% compared to its PyTorch counterpart, thanks to MindSpore’s optimization features and distributed training capabilities. The DCU platform had longer training times but showed significant scalability potential with multi-device parallel training.
Training accuracy analysis revealed consistent loss function declines across all platforms, indicating that the migrated models maintained stable training performance. The MindSpore framework on Ascend 910b exhibited smoother loss curves, suggesting improved stability and efficiency through static graph optimization and mixed precision training.
For inference efficiency, Ascend 910b’s single-step inference time was comparable to A100. Notably, MindSpore on Ascend 910b improved inference speed compared to the PyTorch version. While DCU showed slightly higher inference times, especially for complex models like GraphCast, the Chinese chips demonstrated competitiveness in many tasks.
A significant advantage of the Chinese chips was their energy efficiency. During training, Ascend 910b showed superior energy efficiency, with an average power consumption increase of about 15% while maintaining comparable performance to GPUs. The DCU platform exhibited even lower power consumption, achieving an energy efficiency ratio 1.3 times higher than GPUs in the inference phase, making it ideal for long-duration inference tasks. Overall, Chinese chips demonstrated outstanding energy efficiency, particularly in inference scenarios.
Also Read:
- Unifying AI Efficiency: A New Framework for Sustainable and High-Performance Models
- Optimizing AI Inference: A 3D Approach to Balancing Performance, Cost, and Speed
Conclusion and Future Outlook
This research confirms the viability and benefits of deploying large-scale atmospheric and oceanic AI models on Chinese hardware platforms. The migrated models maintain accuracy and achieve competitive computational performance, with Ascend 910b excelling in distributed training and energy efficiency, and DCU showing promise for energy-efficient inference. This work provides valuable insights and practical guidance for leveraging Chinese domestic chips and frameworks, offering a pathway toward greater technological independence in scientific computing. For more details, you can refer to the full research paper here.
Future efforts will focus on enhancing MindSpore’s operator library, refining distributed training efficiency, and exploring hardware-software co-design strategies tailored for specific meteorological and oceanographic applications. Fostering a robust ecosystem with optimized toolchains and open-source collaboration will be crucial for broader adoption of these platforms.


