Decomposing User Intent for Efficient On-Device AI

TLDR: A new two-stage method called “Decomposed-FT” significantly improves how small, on-device AI models understand user intentions from app interactions. By first summarizing individual actions and then combining these summaries, this approach allows smaller models to achieve better accuracy, even outperforming larger, more complex AI systems, while maintaining privacy and low latency.

Understanding what users intend to do while interacting with their devices is a critical challenge for developing intelligent agents. While powerful, large language models (LLMs) excel at this, they often require significant computational resources, are costly, and raise privacy concerns as data is processed in data centers. This makes them less suitable for on-device applications where privacy, low cost, and minimal latency are paramount.

A recent research paper, titled “Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition,” introduces a groundbreaking approach to tackle this problem. Authored by Danielle Cohen, Yoni Halpern, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan, and Anatoly Efros from Google and Bar-Ilan University, the paper details a novel two-stage method that enables smaller, resource-constrained models to accurately infer user intent, often outperforming even larger models.

The core of their innovation lies in a decomposed strategy. Instead of feeding an entire sequence of user interactions directly to a single model, which can overwhelm smaller systems, they break the task into two manageable stages:

Stage 1: Structured Interaction Summarization

In the first stage, the model processes each individual user interaction—comprising a screenshot of the device interface and the user’s action—to create a concise summary. This summary captures key information about the screen context and the specific action taken. To enhance accuracy, the model also considers the preceding and succeeding interactions, providing crucial context to resolve ambiguities. The summaries are structured to focus on relevant details and avoid speculative interpretations of user intent.

Also Read:

Stage 2: Session-Level Intent Extraction

The summaries from all individual interactions are then aggregated and fed into a second, fine-tuned model. This model’s task is to synthesize these summaries into a single, overarching description of the user’s intent for the entire session. A crucial aspect of this stage is a technique called “label refinement” during training. This process ensures that the model learns to infer intents based solely on the information present in the interaction summaries, preventing it from generating details not supported by the input data.

The researchers evaluated their “Decomposed-FT” (Decomposed Fine-Tuned) approach using small models like Gemini 1.5 Flash 8B and Qwen2 VL 7B, comparing them against traditional methods such as Chain-of-Thought (CoT) prompting and End-to-End fine-tuning, as well as a large model baseline, Gemini 1.5 Pro. The results were compelling: the decomposed approach significantly improved intent extraction performance for small models. On the Mind2Web dataset, for instance, the fine-tuned decomposed approach allowed Gemini Flash 8B to surpass the performance of the larger Gemini 1.5 Pro model using CoT.

An ablation study further highlighted the importance of each design choice, demonstrating that incorporating context from neighboring interactions, using structured summaries, fine-tuning the second stage, and refining training labels all contribute significantly to the method’s success. While the decomposed approach introduces a slight increase in computational cost (2-3x) compared to simple small model baselines, it remains substantially more efficient and faster than relying on large MLLMs, making it practical for on-device deployment. A latency-optimized variant was also shown to address potential real-time application concerns.

This research marks a significant step towards developing more capable and privacy-preserving AI agents that can run directly on user devices. By enabling small models to achieve superior intent understanding, this method paves the way for enhanced personalization, improved work efficiency, and better recall of past activities, all while keeping sensitive user data private. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Decomposing User Intent for Efficient On-Device AI

Stage 1: Structured Interaction Summarization

Stage 2: Session-Level Intent Extraction

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates