SHERLOCK: Enhancing E-commerce Risk Management with Adaptive AI

TLDR: SHERLOCK is a novel framework that uses large language models (LLMs) to improve e-commerce risk management. It addresses challenges like high workload, inconsistent judgments, and evolving fraud patterns by integrating a dynamic Domain Knowledge Base, a Data Flywheel for continuous learning, and a Reflect & Refine (R&R) module. Experiments on JD.com data show SHERLOCK significantly boosts the precision of risk assessments, reduces LLM hallucinations, and drastically improves operational efficiency and expert trust, enabling faster and more accurate fraud detection.

The rapid expansion of e-commerce has brought with it a significant challenge: the escalating battle against sophisticated fraud and shadow economy activities. Risk management teams are constantly overwhelmed by the sheer volume of suspicious cases, each demanding meticulous investigation and deep expert knowledge. This intensive manual process leads to substantial workloads for analysts, inconsistencies in judgment, and slow adaptation to new fraud patterns.

To address these critical issues, researchers have introduced SHERLOCK, an innovative framework designed to enhance e-commerce risk management by leveraging the advanced reasoning capabilities of large language models (LLMs). SHERLOCK aims to provide dynamic knowledge adaptation, making risk investigations more efficient, accurate, and consistent.

Understanding SHERLOCK’s Core Components

The SHERLOCK framework is built upon three primary components that work together to create a robust and adaptive system:

1. Domain Knowledge Base (KB): This is the brain of the system, a comprehensive repository of risk management knowledge. It’s constructed by extracting valuable insights from various sources, including business documents, meeting recordings, and even code repositories. This multi-modal data is transformed into structured knowledge covering specialized terminology (e.g., clarifying specific platform services), complex business logic (e.g., understanding why certain transaction patterns are normal in specific contexts), and evolving risk patterns (e.g., identifying new fraud indicators and their thresholds).

2. Data Flywheel: This component establishes a continuous learning loop, integrating daily operations, expert feedback, and model evaluations. It’s designed to efficiently generate high-quality training data for LLMs at minimal cost. When LLM-generated conclusions don’t meet expectations, these cases are prioritized for expert annotation, ensuring that resources are focused on the most informative samples. A unique “selection-over-creation” strategy simplifies annotation, where experts select and refine LLM-generated insights rather than creating them from scratch. Furthermore, a “suspect-then-rule-out” framework guides the LLM to simulate expert reasoning, improving its analytical processes over time.

3. Reflect & Refine (R&R) Module: This module acts as a critical post-analysis inspection layer. Its main functions are to mitigate hallucinations (incorrect or fabricated information) by the LLM and to enable rapid adaptation to emerging risk patterns. It achieves this by retrieving relevant knowledge from the Domain KB to fact-check and refine the LLM’s initial risk assessments. The R&R module also supports real-time “hotfixes,” allowing for immediate updates to the knowledge base with new business logic or policy adjustments without requiring a full model retraining, ensuring the system remains agile in a dynamic environment.

Real-World Impact and Performance

Experiments conducted on a real-world transaction dataset from JD.com demonstrated SHERLOCK’s significant impact. The framework substantially improved the precision of LLM analysis results, both in factual alignment and in accurately pinpointing risks. Compared to traditional methods and even powerful general-purpose LLMs, SHERLOCK showed a dramatic increase in its Signal-to-Noise Ratio (SNR), indicating a much higher proportion of relevant risk factors to irrelevant ones.

Human experts overwhelmingly preferred SHERLOCK’s output, rating it highly for trustworthiness and helpfulness in decision-making. In live A/B tests on JD.com’s platform, the deployment of the SHERLOCK-based LLM system led to remarkable improvements in operational efficiency. Risk managers were able to make decisions 387% faster, and the expert acceptance rate of the LLM’s recommendations soared to 82%, signifying a massive increase in trust and reliability.

Also Read:

A Step Towards Adaptive Risk Management

SHERLOCK represents a significant advancement in applying LLMs to complex, domain-specific challenges like e-commerce risk management. By combining structured domain knowledge, continuous learning, and a reflective refinement process, it creates an evolvable architecture capable of adapting to the ever-changing landscape of online fraud. This framework not only enhances the accuracy and interpretability of risk assessments but also empowers human experts with more efficient and reliable tools. For more details, you can refer to the original research paper.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

SHERLOCK: Enhancing E-commerce Risk Management with Adaptive AI

Understanding SHERLOCK’s Core Components

Real-World Impact and Performance

A Step Towards Adaptive Risk Management

Gen AI News and Updates

Financial Sector Leans on External Partners for AI Agent Development

Microsoft Unveils MMCTAgent: A Breakthrough in Multimodal AI for Large-Scale Video and Image Analysis

Financial Sector Accelerates AI Agent Deployment for Fraud Prevention and Application Processing, New Supervisory Roles Emerge

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates