Bringing Advanced Language AI to Your Phone: A New Approach to Multi-tasking

TLDR: This research introduces a novel ‘projection merge’ technique for enabling compositional multi-tasking (like summarizing and translating simultaneously) in Large Language Models (LLMs) directly on mobile devices. By adding a small, learnable layer on top of existing task-specific adapters, the method achieves efficient integration and strong performance with minimal computational overhead. The team developed an Android app to demonstrate its practical viability, highlighting benefits like enhanced privacy and speed for real-world applications, such as cross-lingual conversation summarization.

Large Language Models (LLMs) have transformed how we interact with AI, generating content across text, images, and videos. While many powerful AI applications rely on remote servers, there’s a growing interest in bringing these capabilities directly to our devices, like smartphones. This shift offers significant advantages, especially enhanced privacy, as sensitive data remains securely on your device without being sent over networks.

One of the exciting frontiers in on-device AI is “compositional multi-tasking.” Imagine needing to summarize a long conversation and then translate that summary into another language, all at once. Standard approaches often struggle with such complex, simultaneous tasks. They might require extensive retraining or processing tasks one after another, which can be slow and resource-intensive.

A Novel Approach for On-Device Multi-tasking

Researchers have introduced a new method specifically designed for these compositional multi-tasking scenarios, focusing on summarization and translation. Their technique, called “projection merge,” involves adding a small, learnable projection layer on top of existing summarization and translation adapters. Adapters, like Low-Rank Adapters (LoRA), are efficient ways to fine-tune large language models for specific tasks without modifying the entire model. This new projection layer acts as a bridge, allowing the combined adapters to work together effectively.

The key benefit of this design is its efficiency. Compared to alternative strategies that might demand extensive retraining or sequential processing, the projection merge significantly reduces computational overhead. This means your device can handle complex tasks like generating a translated summary from a long conversation much faster and with fewer resources.

Building an On-Device System

To demonstrate the practical viability of their method, the team developed an Android application capable of executing these compositional tasks seamlessly on a smartphone. This fully on-device system ensures that all computations run locally, further enhancing user privacy and reducing operational costs for service providers.

The application’s architecture includes a user interface, an LLM communication endpoint, an inference API, and components for LLM setup and adapter handling. Developing such a system for mobile devices presented unique challenges. For instance, integrating adapters and loading models efficiently required modifications to existing libraries. Memory management was another hurdle, overcome by moving heavy processing tasks to a separate thread, ensuring the user interface remains responsive.

Performance and Practical Benefits

Experimental results have shown that this solution performs well and is fast, both in cloud-based and on-device implementations. The projection merge approach achieved comparable, and in some cases, even better performance than other well-performing but less efficient baselines. Crucially, it introduces only a tiny fraction of additional parameters and storage compared to training a completely new adapter for a combined task.

For example, in tests on a Samsung S23 Ultra Android device, the projection merge method achieved translated summaries in about 24 seconds, outperforming other methods. While this might still seem long for some immediate use cases, it represents a significant step forward for fully on-device AI. The modular design also allows for easy extension to additional languages and other compositional tasks, such as generating reply suggestions combined with translation or tone adjustment.

This research highlights the potential benefits of adopting this framework in real-world applications that demand high-speed operation alongside resource constraints. It’s particularly valuable for users engaging with foreign language content, such as travelers participating in local chat groups, allowing them to easily see summaries of conversations in their own language. You can read the full research paper here.

Also Read:

Future Outlook

While the current implementation successfully demonstrates the feasibility of on-device compositional multi-tasking, the researchers acknowledge areas for further optimization. These include exploring more aggressive quantization techniques (to reduce model size and speed up inference) and integrating LLMs directly into mobile operating systems for even greater efficiency. Despite these ongoing challenges, this work paves the way for more private, efficient, and powerful AI experiences directly on our personal devices.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Bringing Advanced Language AI to Your Phone: A New Approach to Multi-tasking

A Novel Approach for On-Device Multi-tasking

Building an On-Device System

Performance and Practical Benefits

Future Outlook

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates