TLDR: GMoPE (Graph Mixture of Prompt-Experts) is a new framework that combines Mixture-of-Experts (MoE) with prompt-based learning to create more generalizable and efficient graph foundation models. It uses expert-specific prompts and structure-aware routing to allow experts to specialize in different data subdomains. A soft orthogonality constraint prevents expert collapse, and a prompt-only fine-tuning strategy significantly reduces adaptation costs. Experiments show GMoPE outperforms existing baselines in various graph tasks while being highly efficient.
Graph Neural Networks (GNNs) have shown great promise in specific tasks, but their ability to work well across many different types of data and problems has been limited. Often, they face challenges like negative transfer (where learning from one task hurts performance on another), scalability issues with large datasets, and high costs when adapting to new tasks. To tackle these problems, researchers have introduced a new framework called GMoPE, which stands for Graph Mixture of Prompt-Experts. This innovative approach combines two powerful concepts: the Mixture-of-Experts (MoE) architecture and prompt-based learning, specifically tailored for graph data.
GMoPE is designed to make graph foundation models more adaptable and efficient. It does this by using a system where different ‘experts’ specialize in distinct subdomains of graph data. Each expert is guided by its own unique ‘prompt’ – a small, learnable vector that helps it focus. A ‘structure-aware’ routing mechanism then dynamically decides which experts should contribute to a prediction based on the input graph’s characteristics. This ensures that the most relevant experts are engaged for any given task.
A common issue in Mixture-of-Experts models is ‘expert collapse,’ where some experts might become underutilized or redundant. GMoPE addresses this by introducing a ‘soft orthogonality constraint’ across the prompt vectors. This constraint encourages the prompts, and thus the experts, to be distinct and specialized, promoting a more balanced use of all experts and preventing them from learning the same things. This diversification is key to the framework’s robustness.
Another significant advantage of GMoPE is its efficiency during transfer learning. Instead of fine-tuning all the model’s parameters for a new task, GMoPE employs a ‘prompt-only fine-tuning’ strategy. This means that only the lightweight prompt vectors and a small task-specific prediction head are adjusted, while the core parameters of the experts remain frozen. This dramatically reduces the computational time and resources needed to adapt the model to new downstream tasks, making it much more practical for real-world applications.
The GMoPE framework operates in three main stages. First, during the pre-training phase, all expert parameters and their associated prompts are optimized together. This stage uses the structure-aware MoE routing and the soft orthogonality loss to ensure experts specialize and diversify. Second, in the transfer learning phase, the expert parameters are frozen, and only the lightweight prompts are fine-tuned for efficient adaptation to new tasks. Finally, during inference, the router aggregates outputs from the prompt-conditioned experts to generate the final graph representations for various downstream tasks.
Extensive experiments have been conducted to validate GMoPE’s effectiveness across various pretraining strategies and multiple downstream tasks, including link prediction, node classification, and graph classification. The results consistently show that GMoPE outperforms state-of-the-art baseline models. In many cases, it even achieves performance comparable to full parameter fine-tuning, but with only a fraction of the adaptation overhead. For instance, in link prediction, GMoPE showed a 3.42% performance gain over a recent baseline, AnyGraph. For classification tasks, it achieved up to 19.84% improvement in node classification and 6.84% in graph classification over GraphPrompt.
Ablation studies further confirmed the importance of GMoPE’s core components. When the Mixture-of-Experts framework was reduced to a single expert (effectively becoming a GPF baseline), performance significantly degraded, highlighting the benefits of multiple specialized experts. Similarly, removing the expert prompts led to performance degradation and increased computational overhead, underscoring their role in specialization and efficient adaptation. The soft orthogonality loss was also shown to be crucial, as an optimal range for its coefficient ensured proper expert specialization without causing instability.
Also Read:
- GraphKeeper: Advancing Graph Models with Domain-Incremental Learning
- Optimizing LLM Collaboration: A Graph-Based Approach to Test-Time Scaling
This work represents a significant step forward in developing generalizable and efficient graph foundation models. GMoPE provides a principled and scalable framework that can handle the complexities of diverse graph data, offering a powerful tool for advancing graph representation learning. You can read the full research paper for more technical details at GMoPE: A Prompt-Expert Mixture Framework for Graph Foundation Models.


