New AI Model Enhances Video Search by Resolving Multi-Agent Conflicts and Detecting Irrelevant Queries

TLDR: Researchers developed MARLCC, a multi-agent reinforcement learning framework for video moment retrieval. It uses evidential learning to allow agents to compete and resolve conflicts, leading to improved accuracy in finding video moments. Crucially, it can detect ‘out-of-scope’ queries (queries with no matching video moment) in a zero-shot manner by observing high conflict among agents, eliminating the need for extra training.

Video moment retrieval is a fascinating area of artificial intelligence that helps us quickly find specific moments within long, untrimmed videos using simple text queries. Imagine trying to find “the person starts cooking with a pan” in a two-hour movie – this technology aims to pinpoint that exact scene for you. This capability is incredibly useful for various applications, from searching movie scenes to monitoring surveillance footage for specific events or analyzing athlete performance.

Traditionally, video moment retrieval models have focused on finding moments when they are sure to exist within a video. However, a significant challenge arises when a user’s query doesn’t have a corresponding moment in the video at all – what’s known as an “out-of-scope” query. Current systems often struggle with this, either requiring additional training to detect such queries or failing to integrate different models effectively when their results conflict.

A new research paper titled “Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict” by Chaochen Wu, Guan Luo, Meiyun Zuo, and Zhitao Fan introduces a novel approach to tackle these issues. The researchers propose a reinforcement learning-based model that not only accurately locates video moments but also intelligently handles conflicts between different models and identifies out-of-scope queries without needing extra training. You can read the full paper here.

A Multi-Agent System for Smarter Retrieval

The core of their innovation lies in a multi-agent system (MAS) framework called MARLCC (Multi-Agent RL Competition and Conflict). This system employs multiple independent “agents” – essentially different AI models – to work on the same video moment retrieval task. One of these agents is a newly proposed model called ESRL (Evidential Scanner for RL-base MR), which scans the entire video to find moment boundaries and provides “evidential learning” for its predictions.

Evidential learning is a key component. When an agent makes a prediction about a moment’s location, it also generates “evidence” and “uncertainty” for that prediction. Think of it as the agent not just saying “this is the moment,” but also “I’m this confident about it.” This allows the system to understand how reliable each agent’s output is.

Competition and Conflict Among Agents

MARLCC leverages two main concepts: competition and conflict. In the “competition” aspect, different agents independently propose their best-located moments. The system then uses the “evidence” generated by each agent to determine a “trusted IoU” (Intersection over Union) score, which indicates how well the predicted moment overlaps with the actual moment. The agent with the highest trusted IoU is declared the “winner,” and its result is chosen as the final output. This allows the system to combine the strengths of various agents.

The “conflict” aspect is where the system shines in detecting out-of-scope queries. The researchers observed a significant phenomenon: when a query is out-of-scope (meaning there’s no matching moment in the video), the different agents tend to have much higher disagreement or “conflict” in their proposed moment locations. This conflict is measured by the difference in their predicted start and end timestamps. By setting a threshold, MARLCC can identify these high-conflict scenarios as out-of-scope queries in a “zero-shot” manner – meaning it doesn’t need to be specifically trained on out-of-scope examples. This is a major advantage for real-world applications.

Also Read:

Improved Performance and Real-World Applications

Extensive experiments on benchmark datasets like Charades-STA and ActivityNet-Captions demonstrated the effectiveness of MARLCC. The system achieved state-of-the-art results compared to other reinforcement learning-based methods, and even outperformed some non-RL approaches. The ability to detect out-of-scope queries with high accuracy without additional training is particularly valuable, as it prevents the model’s primary moment retrieval ability from being weakened by an extra detection task.

This research opens new avenues for more robust and intelligent video search applications. Users can now be confident that if a query doesn’t have a match, the system can tell them, rather than providing a potentially incorrect or irrelevant result. The findings also highlight the power of modeling competition and conflict within multi-agent systems to enhance reinforcement learning performance.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

New AI Model Enhances Video Search by Resolving Multi-Agent Conflicts and Detecting Irrelevant Queries

A Multi-Agent System for Smarter Retrieval

Competition and Conflict Among Agents

Improved Performance and Real-World Applications

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates