TLDR: The Multi-view Feature Propagation (MFP) framework addresses the challenges of feature sparsity and privacy risks in Graph Neural Networks (GNNs). It enhances node classification performance by creating multiple Gaussian-noised views of available features, propagating them independently through the graph, and then aggregating them. Experiments show MFP outperforms existing baselines in node classification under extreme feature sparsity while significantly reducing privacy leakage, as it generates alternative representations rather than reconstructing original sensitive features. The framework is robust across varying graph properties and hyperparameters, offering a practical solution for privacy-aware graph learning.
Graph Neural Networks (GNNs) have become incredibly powerful tools for understanding complex relationships in data, from social networks to biological systems. They excel at tasks like classifying nodes within a graph, but their success often hinges on having complete and rich information about each node. However, in the real world, this ideal scenario is rare. Node features can be incomplete, or worse, contain highly sensitive personal information like demographics, health status, or preferences. Directly using such data raises significant privacy concerns and risks unintended data leakage.
To tackle this dual challenge of sparse features and privacy risks, researchers Etzion Harari and Moshe Unger from Tel Aviv University have introduced a novel framework called Multi-view Feature Propagation (MFP). This new approach aims to improve how GNNs perform node classification even when features are extremely sparse, while also ensuring that sensitive information remains protected.
The Problem with Traditional Approaches
One common strategy to protect privacy is to make features sparse, meaning only a small subset of features is used, limiting exposure. While this helps privacy, it often degrades the performance of GNNs, which expect a full set of features. Feature Propagation (FP) is a technique that helps fill in missing features by diffusing available information across the graph structure. However, traditional FP can still be problematic for privacy, as its goal is to reconstruct missing features, potentially re-exposing sensitive data.
Introducing Multi-view Feature Propagation (MFP)
MFP extends the idea of traditional Feature Propagation by taking a multi-faceted approach. Instead of relying on a single set of features, MFP creates multiple, slightly different “views” of the available data. Here’s how it works:
First, a process called Stochastic Sparse Sampling is applied. This involves taking the original, potentially sensitive features and replacing most of them with random noise, while keeping only a small, randomly selected subset. This creates a privacy-preserving version of the features.
Next, Multi-view-based Propagation comes into play. MFP generates several complementary propagated views. Each view is created by taking an even smaller, randomly sampled subset of the already retained features, and then adding more noise to them. These noisy, partial feature sets are then independently propagated through the graph’s connections. Think of it like looking at the same object through several slightly different, blurry lenses – no single lens gives a perfect picture, but combining insights from all of them helps you understand the object better.
Finally, these independently propagated views are combined, or “aggregated,” into a single, rich representation. This combined representation is then fed into a GNN for the final classification task.
Key Benefits: Performance and Privacy
The MFP framework offers two significant advantages:
- Enhanced Performance: By integrating multiple, diverse feature views, MFP creates richer and more balanced node representations. This reduces the risk of relying too heavily on any single piece of information and improves the model’s ability to generalize, leading to better node classification accuracy, especially when features are extremely sparse.
- Strong Privacy Preservation: Because MFP operates on heavily noised and partial views of the original features, it significantly limits the direct exposure of sensitive information. The process is designed not to reconstruct the original features, but rather to create alternative, privacy-preserving representations.
Experimental Validation
The researchers conducted extensive experiments on standard datasets like Cora, Citeseer, and Pubmed, simulating scenarios where 99% of features were missing. MFP consistently outperformed state-of-the-art baseline methods, including traditional Feature Propagation (FP) and Random Feature Propagation (RFP). Remarkably, MFP achieved classification accuracy levels very close to those of a GNN trained on the full, unmasked feature matrix, demonstrating its ability to maintain utility under extreme privacy constraints.
Regarding privacy, MFP showed minimal leakage. Measurements like Root Mean Square Error (RMSE) and Pearson Correlation Coefficient (PCC) indicated that the propagated features were not reconstructions of the original sensitive data. In fact, their similarity to the original features was often comparable to random noise.
A detailed sensitivity analysis further confirmed MFP’s robustness. Its performance improved with higher graph homophily (the tendency of similar nodes to connect) and was stable across a wide range of propagation depths and numbers of views. This means MFP is reliable and doesn’t require extensive fine-tuning for different datasets.
Also Read:
- Deep Learning Advances Multimodal Data Clustering
- Navigating Complex Questions: A Graph-Based Approach for Enhanced AI Retrieval
Practical Implications
MFP has significant implications for industries that handle sensitive user data, such as healthcare, finance, and e-commerce. Organizations can leverage this framework to extract valuable insights and power advanced analytics without violating privacy regulations. By masking and propagating only partial, noised views of user features, companies can safely analyze data, maintain compliance, and avoid costly legal and reputational risks. Moreover, it offers a dependable solution for general graph learning tasks where data might be incomplete or noisy.
While MFP shows great promise, the authors acknowledge areas for future work, such as exploring its application in more complex heterophilous graphs, integrating formal differential privacy accounting, and testing on larger, dynamic graph structures. Nevertheless, MFP represents a significant step forward in balancing predictive accuracy, stability, and privacy protection in graph learning. You can read the full research paper here.


