TLDR: PASSREFINDER-FL is a novel framework that leverages graph neural networks and federated learning to proactively predict credential stuffing risks between websites. It models password reuse as relations in a website graph, extracts diverse website features, and collaboratively trains an AI model across multiple administrators without sharing sensitive user data. This privacy-preserving approach achieves high accuracy in identifying high-risk password reuse, outperforming existing methods, and offers practical applications such as targeted user warnings and selective two-factor authentication.
Credential stuffing attacks continue to be a major threat to online users, exploiting the common habit of reusing passwords across different websites. These attacks cause significant financial losses and compromise user security. Traditional methods to combat this, such as compromised credential checking (C3) services like Have I Been Pwned, often react only after a data breach has occurred. Other approaches that involve coordinating websites to detect password reuse can severely impact user experience by restricting password creation or denying access, and they struggle with scalability and privacy concerns due to the need for sharing sensitive user information.
Addressing these critical limitations, researchers have introduced PASSREFINDER-FL, a groundbreaking framework designed to proactively predict the risk of credential stuffing across various websites. This innovative system moves beyond reactive detection by forecasting where password reuse is most likely to occur, allowing administrators to take preventative measures.
Understanding Password Reuse Relations
At the heart of PASSREFINDER-FL is the concept of “password reuse relations.” This refers to the likelihood of users reusing the same passwords between different websites. The framework models these relations as edges in a “website graph,” where each website is a node. By analyzing these connections, PASSREFINDER-FL can assess the risk of credential reuse between sites. This graph-based approach, powered by Graph Neural Networks (GNNs), is crucial for understanding the complex interplay of password choices across a vast number of online services.
Overcoming Key Challenges
The development of PASSREFINDER-FL tackled three main technical hurdles:
- Complicated Password Reuse Relations: User password choices are influenced by many factors across different websites. PASSREFINDER-FL uses a graph structure and GNNs to effectively learn and represent these intricate relationships, benefiting from the propagation and aggregation of neighborhood influences within the graph.
- Diverse Data Structures and Feature Influences: Website characteristics that affect password reuse come in many forms (e.g., location, category, content, URL, security posture). To handle this diversity, PASSREFINDER-FL employs multi-modal learning and attention mechanisms. This allows the model to process each feature type independently and weigh its importance, leading to more accurate predictions.
- Privacy and Scalability with Arbitrary Websites: Predicting risks involving new or external websites traditionally requires sharing sensitive user data, which raises significant privacy concerns. PASSREFINDER-FL solves this by adopting a federated learning (FL) approach. Instead of sharing raw user data or even local password reuse graphs, administrators only exchange model gradients or weights. This collaborative training allows the system to build a robust global model while strictly preserving user privacy.
How PASSREFINDER-FL Works
The framework operates through three main components:
Graph Construction: Each website administrator independently builds a local password reuse graph based on the websites they manage. An edge connects two website nodes if the proportion of users reusing their passwords across those sites exceeds a certain threshold. Importantly, these local graphs are not shared between administrators.
Feature Extraction: Various public website features are extracted from sources like web analytics services. These features, categorized into modalities such as location (from IP addresses), category (from services like McAfee), content (HTML text analyzed by multilingual language models), URL characteristics, and security posture (from Shodan, CVE, CVSS, HTTPS adoption), are then vectorized as input for the GNNs.
Graph Federated Learning & Cross-Admin Edge Representation: This is where the privacy-preserving magic happens. Administrators collaboratively train a GNN-based model. During training, each administrator updates their local GNN weights using their private data and sends only these updated weights to a central server. The server aggregates these weights to refine a global model, which is then redistributed to all participants. For inference, administrators compute node representations locally and then exchange only these anonymized vectors to predict cross-admin password reuse relations. This ensures no sensitive information like passwords or usernames is ever shared.
Also Read:
- How Federated Learning is Reshaping Financial Security
- Securing Vehicular Networks: Understanding and Defending Against Cyber Threats in Distributed Federated Learning
Impressive Performance and Practical Applications
Evaluated on a real-world dataset of 360 million breached accounts from 22,378 websites, PASSREFINDER-FL achieved an F1-score of 0.9153 in the federated learning setting. This significantly outperforms previous methods, including its original non-FL counterpart, PASSREFINDER, and other state-of-the-art GNN models. The study also showed that the system’s performance improves with more participating administrators, reaching practical effectiveness with as few as five participants.
Beyond binary risk prediction, PASSREFINDER-FL can quantify password reuse likelihood as actionable risk scores, providing a more fine-grained understanding of potential threats. This allows administrators to prioritize website pairs by their relative risks.
The practical applications of PASSREFINDER-FL are substantial:
- Reasonable Warning: Instead of broad, often ignored warnings, administrators can issue selective, targeted warnings to users most at risk, providing specific reasons based on website features.
- Selective 2FA: Two-factor authentication can be selectively recommended or enforced for users identified as high-risk, balancing security with user convenience.
- Practical Coordination: The framework facilitates the formation of effective coordination pools among administrators, allowing them to collaboratively detect password reuse or credential stuffing without the scalability and privacy issues of older methods.
For more technical details, you can refer to the full research paper: PASSREFINDER-FL: Privacy-Preserving Credential Stuffing Risk Prediction via Graph-Based Federated Learning for Representing Password Reuse between Websites.
PASSREFINDER-FL represents a significant leap forward in combating credential stuffing. By combining graph neural networks with federated learning, it offers a powerful, scalable, and privacy-preserving solution that empowers website administrators to proactively protect their users from one of the most prevalent cyber threats.


