AI's Edge in Anti-Money Laundering: Learning from Transaction Patterns

TLDR: A new AI method uses a transformer neural network and contrastive learning to detect money laundering. It analyzes raw transaction data, learns patterns without needing labels, and then employs a two-threshold system to identify fraudsters while keeping false alarms low, significantly outperforming traditional rule-based and other machine learning approaches.

Money laundering is a significant global problem, estimated to involve 2 to 5% of the world’s GDP annually. It undermines governments’ ability to collect taxes, fight crime, and damages the stability of financial institutions. Detecting it is incredibly challenging because perpetrators often mimic legal financial behaviors, and money laundering patterns constantly evolve to evade detection.

Financial institutions are under strict regulations to detect and report suspicious activities, falling under the ‘anti-money laundering’ (AML) framework. To cope with the massive volume of customers and transactions, they often rely on automated ‘rule-based systems’. While these systems offer explainability, they suffer from severe limitations, particularly a very low precision, leading to 95% to 98% false positives. This inefficiency stems from their reliance on hard-coded thresholds that are difficult to update as fraud patterns change.

This new research introduces a novel approach that leverages machine learning to address these challenges, complementing existing rule-based frameworks. Instead of relying on aggregated, summarized features of customer activity, which often lose valuable information and require expert knowledge to design, this work processes the entire set of raw transaction time series.

A New Approach: Transformers and Contrastive Learning

The core of this new procedure involves a transformer neural network, a powerful tool initially developed for natural language processing. Just as language can be seen as a ‘time series of words’, financial transactions can be viewed as a ‘time series of events’. The transformer is adept at capturing long-range dependencies within these sequences, which is crucial for identifying money laundering patterns that might unfold over several months.

A key innovation is the use of ‘contrastive learning’ to pre-train the transformer. This is done without any labeled data, which is a major advantage given the scarcity and unreliability of truly labeled fraudulent data in real-world scenarios. Contrastive learning works by teaching the model to recognize similarities and differences between observations. It learns to create representations where similar financial activities are mapped closer together in a digital space, while dissimilar ones are pushed further apart. This self-supervised approach helps the transformer learn the underlying ‘semantics’ of financial transactions.

The process involves feeding raw transaction data into the transformer, which generates a compact numerical representation (an ’embedding’) for each account. A ‘projection head’ then further refines this representation into a lower-dimensional space, making computations more efficient. To enhance the learning process, the system samples ‘positive examples’ (observations similar to a reference) and ‘negative examples’ (dissimilar observations) based on auxiliary data that contains aggregated metrics and customer descriptors. It even adds a small amount of Gaussian noise to these examples in the projection space to explore new patterns and prevent overfitting, especially useful in highly imbalanced datasets where fraudsters are rare.

Two-Threshold Classification for Better Detection

Once the transformer has learned these powerful representations, they are used for the downstream task of money laundering detection. This involves a classification step where a simple logistic regression classifier is trained on a small amount of labeled data (accounts identified as fraudsters or non-fraudsters).

To tackle the severe class imbalance (where fraudsters are a very small percentage of accounts), the research introduces a ‘two-thresholds’ classification procedure:

Low Threshold (Tl): This threshold helps identify and discard the least suspicious observations. Accounts with scores below Tl are confidently declared non-fraudulent, saving analysts time by removing them from further investigation.
High Threshold (Th): This threshold targets the most suspicious observations. Accounts with scores above Th are flagged as potential fraudsters for in-depth investigation.

Crucially, both thresholds are calibrated using the Benjamini-Hochberg (BH) procedure, a statistical method that controls the ‘False Discovery Rate’ (FDR). This means the procedure ensures that the proportion of false positives (legitimate accounts wrongly flagged as suspicious) is kept below a prescribed level, a significant improvement over traditional rule-based systems that often suffer from very high false positive rates.

Also Read:

Experimental Validation

The methodology was tested on a real-life, anonymized dataset of company bank accounts, comprising complex time series of transactions with both quantitative and qualitative features. The dataset reflected the real-world challenge of strong class imbalance, with fraudsters making up only 5% of the test set.

Visualizations of the learned representations showed that the transformer, trained with contrastive learning, was able to create distinct clusters for fraudsters, making them more easily distinguishable compared to representations learned by other methods like LSTM autoencoders or traditional tabular data approaches. The transformer-based approach consistently outperformed competitors in terms of separating the score distributions of fraudsters and non-fraudsters.

When applying the two-thresholds procedure, the transformer-based method (especially with fine-tuning) demonstrated a significantly higher ability to detect true fraudsters for a given FDR level compared to other models. For instance, at an FDR of 0.40, it detected more than twice the number of fraudsters compared to the LSTM-based approach. Similarly, for the low threshold, it successfully identified a much larger percentage of non-fraudulent accounts, further optimizing investigation resources.

In conclusion, this research presents a robust and adaptive framework for money laundering detection. By combining the power of transformer neural networks with contrastive learning and a controlled two-threshold classification, it offers a promising path to overcome the limitations of traditional systems, leading to more efficient and accurate identification of financial crime. You can read the full research paper here: Representation learning with a transformer by contrastive learning for money laundering detection.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI’s Edge in Anti-Money Laundering: Learning from Transaction Patterns

A New Approach: Transformers and Contrastive Learning

Two-Threshold Classification for Better Detection

Experimental Validation

Gen AI News and Updates

FinTech and Payments Sector Achieves AI Compliance Maturity, Chartis and Hawk Study Reveals

LexisNexis Risk Solutions Honored for Pioneering Fraud and Financial Crime Prevention at Regulation Asia Awards 2025

Enhancing Interpretability and Performance in Vision Transformers with Randomized-MLP Regularization

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates