Advancing Emotion Recognition in Conversations with LLM-Generated Datasets

TLDR: This paper introduces a cost-effective method using a small Large Language Model (LLM) to generate high-quality, diverse datasets for Emotion Recognition in Conversations (ERC). By synthesizing both dialogue and emotion labels, the approach addresses data scarcity, bias, and subjectivity in existing datasets. Experiments show that models trained on these LLM-generated datasets exhibit improved robustness and performance on standard ERC benchmarks, demonstrating the potential of synthetic data to enhance machine intelligence in understanding human emotions.

Emotion Recognition in Conversations (ERC) is a crucial area in artificial intelligence, aiming to understand how emotions shift during human interactions. This capability is vital for developing advanced machine intelligence, especially for applications like social robotics. However, a significant hurdle in ERC research is the lack of high-quality, diverse datasets. Existing datasets often come from biased sources like TV shows or social media, leading to imbalanced emotion distributions. Furthermore, creating these datasets is expensive and time-consuming, involving complex participant recruitment, ethical considerations, and the challenge of consistent and accurate labeling due to the subjective nature of emotions.

Traditional methods for annotating emotional data often involve multiple human annotators, but even then, disagreements are common, leading to reliability issues. Different datasets also use varying sets of emotion labels, speaker numbers, and languages, making it difficult to combine them for broader research. This paper addresses these challenges by exploring a novel approach: leveraging a small, resource-efficient, and general-purpose Large Language Model (LLM) to generate synthetic ERC datasets.

The core idea is that if an LLM can generate both the conversational utterances and their corresponding emotion labels simultaneously, it can significantly improve dataset consistency and reliability. This method also bypasses the high costs and complexities associated with traditional data collection and annotation. The researchers used Vicuna 1.5, a 13-billion-parameter model, which proved capable of generating natural and diverse dialogues while maintaining consistency, offering an affordable and computationally efficient solution.

Two types of datasets were generated for three widely used ERC benchmarks (MELD, EmoryNLP, and IEMOCAP): “Natural” and “Balanced.” Natural datasets are created freely by the LLM without specific emotional biases, reflecting real-world emotion distributions where some emotions (like happiness or neutrality) are more common than others (like fear or disgust). These are useful for developing more realistic interactive systems. Balanced datasets, on the other hand, are designed to address class imbalance by ensuring that specific emotions appear in a significant number of dialogues, even if not uniformly distributed. While less suited for generative applications due to their intentional bias, they are highly effective for developing more accurate emotion classifiers.

A key aspect of this data generation process was “prompt engineering.” The researchers carefully crafted prompts to guide the LLM, instructing it to provide speaker names, utterances, and consistent emotion labels within a specific structure. To ensure accuracy and prevent the LLM from “hallucinating” or forgetting details, emotion labels were assigned numerical symbols, a technique proven to improve LLM performance in logical reasoning tasks. For balanced datasets, the prompt was modified to ensure at least one utterance expressed a specific emotion, iterating through all target labels.

The utility of these LLM-generated datasets was evaluated using three popular ERC classifier architectures: CoMPM, EmoOne-RoBERTa, and TODKAT. The models were trained on the synthetic data and then fine-tuned on the original benchmark datasets. The results were highly promising: models trained on the LLM-generated datasets consistently showed strong robustness and generalization capabilities. In most cases, they performed comparably to or even better than models trained solely on the original datasets. This demonstrates that even relatively small LLM-generated datasets can significantly enhance ERC classifier performance through a process known as transfer learning.

The study also analyzed how different label distributions affect model performance. Interestingly, the optimal distribution varied by dataset. For MELD, balanced datasets yielded the highest scores, suggesting a need for more evenly distributed labels. For IEMOCAP, natural datasets performed best, indicating that real-world emotion distributions are more beneficial for this benchmark. EmoryNLP remained a challenging dataset, with no clear advantage observed between balanced and natural datasets. These findings highlight that the impact of label distribution is dataset-dependent, emphasizing the importance of tailoring synthetic data generation strategies.

Also Read:

Further statistical validation using the Friedman rank sum test confirmed the significance of these findings, particularly for CoMPM and EmoOne-RoBERTa, where performance improvements were unlikely due to random chance. This research provides a reproducible, affordable, and computationally efficient method for creating high-quality ERC datasets, addressing long-standing challenges in the field. The methodology is also flexible enough to be adapted for other Natural Language Processing tasks. For more technical details, you can refer to the full research paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Advancing Emotion Recognition in Conversations with LLM-Generated Datasets

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates