Efficiently Combining Tasks for AI on Your Phone

TLDR: This research introduces ‘Learnable Calibration’, a new method enabling Large Language Models (LLMs) on devices like smartphones to perform multiple tasks simultaneously (e.g., summarizing and translating text) in a single step. It achieves high performance with minimal additional storage by calibrating existing task-specific adapters, addressing the limitations of current inefficient or ineffective approaches for on-device compositional multi-tasking.

Large Language Models (LLMs) have transformed how we interact with technology, excelling in tasks from answering questions to summarizing text. While powerful, these models often require substantial computational resources, making their deployment on everyday devices like smartphones a significant challenge. A common approach to adapt LLMs for specific tasks is through Parameter-Efficient Fine-Tuning (PEFT), particularly using Low-Rank Adapters (LoRAs). These adapters allow models to learn new skills with minimal additional parameters, making them suitable for on-device use.

Traditionally, LLMs are fine-tuned for one task at a time. For instance, you might have one adapter for summarization and another for translation. When a user needs to perform multiple tasks, like summarizing a text and then translating that summary, current methods often fall short. Existing ‘model merging’ techniques, which combine multiple task-specific models into one, are typically designed for scenarios where only one task is performed at a time. They struggle when a single input requires the simultaneous execution of multiple tasks, a concept the researchers call ‘compositional multi-tasking’. Imagine needing a translated summary of a long document – this requires both summarization and translation to happen concurrently and efficiently.

The core problem addressed by this research is enabling this ‘compositional multi-tasking’ on resource-constrained devices. Simple, multi-step pipelines (where one task is done, then its output is fed to another task) are inefficient, requiring multiple processing passes and longer times. Training a completely new adapter for every possible combination of tasks is also impractical due to storage limitations on devices.

Introducing Learnable Calibration

To overcome these limitations, researchers from Samsung R&D Institute UK and Samsung Research propose a novel method called Learnable Calibration. The key idea is to leverage the existing task-specific LoRAs already present on a device and then ‘calibrate’ them with a very small number of additional, learnable parameters. This calibration process allows the merged adapters to handle multiple tasks simultaneously in a single inference pass, achieving high performance without significant computational or storage overhead.

The method works by taking linearly merged single-task LoRAs as a starting point. Then, a small set of ‘calibration parameters’ are learned on compositional task data. These parameters are designed to be shared across different layers of the model, further enhancing efficiency. Two variations of Learnable Calibration were explored: a smaller version that learns column-wise biases (Learnable Calibration) and a larger, more expressive version that introduces two low-rank matrices (Learnable Calibration++).

A New Benchmark for On-Device Multi-tasking

To properly evaluate compositional multi-tasking, the researchers developed a new benchmark. This benchmark includes four practical compositional tasks:

Summarization combined with translation (English to Spanish, French, or German)
Summarization combined with tone adjustment (professional, casual, witty, paraphrase)
Reply suggestion combined with translation
Reply suggestion combined with tone adjustment

This comprehensive benchmark allows for a thorough assessment of how different approaches handle the complexities of simultaneous task execution on devices with limited resources.

Also Read:

Key Findings and Impact

The experimental results, using models suitable for on-device deployment (like LLaMA 3.2 1B, Qwen2.5 1.5B, and StableLM2 1.6B), demonstrate the effectiveness of Learnable Calibration. Traditional merging strategies and zero-shot approaches performed poorly for compositional multi-tasking. While inefficient baselines (like multi-step LoRA usage or training a dedicated ‘joint-expert’ LoRA for each compositional task) showed good performance, they were resource-intensive.

Learnable Calibration, especially its ‘++’ variation, achieved comparable or even superior performance to these inefficient baselines, but with significantly less overhead. It requires only a minimal number of additional parameters (approximately 0.08–0.56% of a joint-expert LoRA’s parameters), translating to less than 0.5 MB of additional storage. This makes it highly suitable for on-device applications.

The research also showed that Learnable Calibration scales well with different model sizes and can even handle combinations involving three tasks (e.g., summarization, tone adjustment, and translation). The qualitative analysis revealed that while other methods often fail to perform one or both tasks in a compositional setting, Learnable Calibration consistently succeeds in executing all required tasks. This work lays crucial groundwork for advancing the capabilities of LLMs in real-world, resource-constrained multi-tasking scenarios. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Efficiently Combining Tasks for AI on Your Phone

Introducing Learnable Calibration

A New Benchmark for On-Device Multi-tasking

Key Findings and Impact

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates