Enhancing Multi-Task Learning in Transformers Through Dynamic Token Adjustments

TLDR: DTME-MTL is a new framework that improves transformer-based Multi-Task Learning (MTL) by resolving gradient conflicts in the model’s token space. It categorizes conflicts into range and null space types, applying adaptive token modulation or expansion. This approach efficiently mitigates negative transfer and reduces overfitting with minimal parameter increase, making it a scalable solution for enhancing existing MTL models.

Multi-Task Learning (MTL) is a powerful technique that allows a single neural network to learn multiple tasks simultaneously. This approach can lead to improved generalization, better efficiency, and faster convergence compared to training separate models for each task. However, a significant challenge in MTL is “negative transfer,” where the learning of one task can inadvertently degrade the performance of another due to conflicting objectives.

While modern transformer-based architectures have greatly advanced MTL performance, their inherent fixed capacity and rigid structure often limit their ability to adapt to the dynamic relationships between different tasks. Existing methods often try to address this by directly converting shared network parameters into task-specific ones, which can be inefficient and lead to excessive growth in the number of parameters, making them less scalable for large transformer models.

A new framework, Dynamic Token Modulation and Expansion (DTME-MTL), has been proposed to tackle these limitations. This innovative approach is designed to work with any transformer-based MTL architecture. Unlike previous methods that modify network parameters, DTME-MTL operates entirely within the “token space” of the transformer. Tokens are essentially the learnable representations of input data within the transformer model.

The core idea behind DTME-MTL is to identify and resolve gradient conflicts that occur in this token space. The researchers categorize these conflicts into two main types: range space conflicts and null space conflicts. When conflicts arise in the range space, DTME-MTL applies “token modulation” through an affine transformation, essentially adjusting existing tokens. For conflicts in the null space, it employs “token expansion” by introducing new, task-specific tokens.

This adaptive solution enhances the model’s flexibility and helps reduce overfitting, a common problem when models become too specialized to their training data. By focusing on token space manipulation, DTME-MTL achieves efficient adaptation without significantly increasing the model’s parameters. This means it can improve performance without demanding massive computational resources for training larger networks from scratch, thus preserving the benefits of using powerful pre-trained transformers.

Extensive experiments have shown that DTME-MTL consistently improves multi-task performance across various datasets, including NYUD-v2, PASCAL-Context, and Taskonomy. Remarkably, these improvements are achieved with minimal computational overhead and only a tiny increase in the total number of network parameters (typically between 0.046% and 0.46% depending on the backbone size). The framework also seamlessly integrates with existing state-of-the-art transformer-based MTL architectures like InvPT and TaskPrompter, further boosting their performance.

Also Read:

The research highlights that resolving conflicts at the token level, rather than directly at the parameter level, offers a more effective strategy to mitigate negative transfer while avoiding issues like overfitting. The paper also provides insights into the optimal timing for network expansion and the importance of applying the right conflict resolution strategy (modulation or expansion) to the correct type of conflict. For more technical details, you can refer to the full research paper: Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Multi-Task Learning in Transformers Through Dynamic Token Adjustments

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates