AI Shopper Agents and Genetic Algorithms Refine E-commerce Search Queries

TLDR: OPTAGENT is a novel framework that optimizes e-commerce search queries by using a multi-agent simulation and genetic algorithms. LLM-based agents act as diverse shopping customers, evaluating product relevance and purchase intent to create a dynamic ‘fitness score’ for queries. This score then guides an evolutionary algorithm to iteratively refine queries. The framework significantly improves query performance, especially for challenging, infrequent ‘tail queries’, demonstrating a powerful method for optimizing LLMs in subjective domains without relying on traditional, static reward signals.

In the fast-paced world of e-commerce, millions of users search for products daily on platforms like Amazon and Etsy. Often, these queries are short, ambiguous, or contain typos, making it challenging for search engines to accurately understand user intent. This problem, known as Query Rewriting (QR), is crucial for connecting shoppers with the products they truly desire. However, evaluating whether a rewritten query genuinely captures user intent is a subjective task, lacking a single ‘correct’ answer, which makes traditional optimization methods difficult.

Understanding the Challenge

Large Language Models (LLMs) have shown remarkable capabilities in various tasks, especially those with clear, verifiable solutions like coding or mathematics. But for subjective tasks like e-commerce query rewriting, where the ‘gold standard’ is elusive, their adoption faces hurdles. Existing methods often rely on human feedback, which is costly and slow, or a single LLM acting as a judge. However, a single LLM judge can be prone to biases, lack robustness, and be unreliable when evaluating complex criteria.

Introducing OPTAGENT: A Novel Approach

A new framework called OPTAGENT addresses this fundamental challenge by combining multi-agent simulations with genetic algorithms to verify and optimize queries for e-commerce query rewriting. Instead of a static reward model or a single LLM judge, OPTAGENT employs multiple LLM-based agents, each simulating a unique shopping customer, to provide a dynamic reward signal. This collective judgment forms an effective ‘fitness function’ for an evolutionary algorithm that continuously refines the user’s initial query.

How OPTAGENT Works

The OPTAGENT framework operates in two primary stages: multi-agent evaluation and genetic algorithm optimization.

Multi-Agent Evaluation: OPTAGENT uses an ensemble of LLM-based agents, each acting as a simulated user. To ensure diverse reasoning paths and avoid biases that can arise from predefined ‘personas,’ each agent is initialized with a different ‘temperature’ setting. A lower temperature leads to more deterministic outputs, while a higher temperature encourages more exploratory and varied responses. For a given rewritten query, each agent searches the shopping platform, analyzes the first page of results (product title, description, image, price, reviews, shipping), and assigns a semantic relevance score (Fully Relevant, Partially Relevant, or Irrelevant) to each product. After evaluating all products, the agent decides which products it would ‘purchase’ and calculates a total raw purchase value. These individual judgments are then aggregated into a single, continuous fitness score, which considers the average semantic score of the top-10 products, the average semantic score of all retrieved products, and a normalized purchase value.

Genetic Algorithms for Optimization: The simulation-based fitness function guides a genetic algorithm, which is inspired by natural selection. This algorithm is robust to the ‘noisy’ and subjective nature of the fitness score. The process involves:

Initial Population Generation: An LLM generates several diverse, semantically similar versions of the original user query to start the optimization process.
Selection: The top-performing queries (based on their fitness scores) from the current generation are directly passed to the next generation.
Crossover: With a certain probability, two parent queries are selected, and an LLM combines their meaningful semantic elements to create a new ‘child’ query.
Mutation: With another probability, a selected query undergoes a small but meaningful alteration (e.g., using a synonym, reordering words) by an LLM to create a new variant.

This iterative process continues for a fixed number of generations, with the goal of discovering high-fitness queries. The final output is the query with the highest fitness score found throughout the entire evolutionary process.

Key Findings and Performance

OPTAGENT was evaluated on a dataset of 1000 real-world e-commerce queries across five categories. The results showed significant improvements:

On average, OPTAGENT improved query fitness by 21.98% over the original user query.
It outperformed a Best-of-N LLM rewriting baseline by 3.36%.
The framework was particularly effective for ‘tail queries’ (infrequent search terms), showing the largest relative improvement (28.67%). This is crucial because traditional methods struggle with tail queries due to a lack of historical data.
Performance consistently improved across generations, indicating that the evolutionary operators effectively discover better queries over time.
An ablation study confirmed that the evolutionary operations, especially the ‘crossover’ mechanism, are critical for achieving peak performance.

The evaluation agents also exhibited a position bias, similar to real users, preferring products listed higher in search results. While some limitations were noted (e.g., difficulty parsing hidden information in interactive website elements or over-reliance on customer reviews for new products), the agents showed a moderate and meaningful alignment with human judgment.

Also Read:

Conclusion

OPTAGENT offers a generalizable and scalable solution for optimizing LLMs in subjective domains where explicit reward signals are scarce. By replacing static reward functions with a dynamic fitness evaluation derived from a multi-agent simulation, it creates a rich and nuanced landscape that better captures the complexity of human preference. This approach, detailed further in the research paper available at arXiv:2510.03771, opens new avenues for developing more capable and aligned AI systems in a wide range of human-centric applications, particularly in e-commerce query rewriting.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI Shopper Agents and Genetic Algorithms Refine E-commerce Search Queries

Understanding the Challenge

Introducing OPTAGENT: A Novel Approach

How OPTAGENT Works

Key Findings and Performance

Conclusion

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

LinkedIn Revolutionizes People Search with Generative AI for 1.3 Billion Users

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates