spot_img
HomeResearch & DevelopmentAdvancing Emotion Recognition in Conversations with LLM-Generated Datasets

Advancing Emotion Recognition in Conversations with LLM-Generated Datasets

TLDR: This paper introduces a cost-effective method using a small Large Language Model (LLM) to generate high-quality, diverse datasets for Emotion Recognition in Conversations (ERC). By synthesizing both dialogue and emotion labels, the approach addresses data scarcity, bias, and subjectivity in existing datasets. Experiments show that models trained on these LLM-generated datasets exhibit improved robustness and performance on standard ERC benchmarks, demonstrating the potential of synthetic data to enhance machine intelligence in understanding human emotions.

Emotion Recognition in Conversations (ERC) is a crucial area in artificial intelligence, aiming to understand how emotions shift during human interactions. This capability is vital for developing advanced machine intelligence, especially for applications like social robotics. However, a significant hurdle in ERC research is the lack of high-quality, diverse datasets. Existing datasets often come from biased sources like TV shows or social media, leading to imbalanced emotion distributions. Furthermore, creating these datasets is expensive and time-consuming, involving complex participant recruitment, ethical considerations, and the challenge of consistent and accurate labeling due to the subjective nature of emotions.

Traditional methods for annotating emotional data often involve multiple human annotators, but even then, disagreements are common, leading to reliability issues. Different datasets also use varying sets of emotion labels, speaker numbers, and languages, making it difficult to combine them for broader research. This paper addresses these challenges by exploring a novel approach: leveraging a small, resource-efficient, and general-purpose Large Language Model (LLM) to generate synthetic ERC datasets.

The core idea is that if an LLM can generate both the conversational utterances and their corresponding emotion labels simultaneously, it can significantly improve dataset consistency and reliability. This method also bypasses the high costs and complexities associated with traditional data collection and annotation. The researchers used Vicuna 1.5, a 13-billion-parameter model, which proved capable of generating natural and diverse dialogues while maintaining consistency, offering an affordable and computationally efficient solution.

Two types of datasets were generated for three widely used ERC benchmarks (MELD, EmoryNLP, and IEMOCAP): “Natural” and “Balanced.” Natural datasets are created freely by the LLM without specific emotional biases, reflecting real-world emotion distributions where some emotions (like happiness or neutrality) are more common than others (like fear or disgust). These are useful for developing more realistic interactive systems. Balanced datasets, on the other hand, are designed to address class imbalance by ensuring that specific emotions appear in a significant number of dialogues, even if not uniformly distributed. While less suited for generative applications due to their intentional bias, they are highly effective for developing more accurate emotion classifiers.

A key aspect of this data generation process was “prompt engineering.” The researchers carefully crafted prompts to guide the LLM, instructing it to provide speaker names, utterances, and consistent emotion labels within a specific structure. To ensure accuracy and prevent the LLM from “hallucinating” or forgetting details, emotion labels were assigned numerical symbols, a technique proven to improve LLM performance in logical reasoning tasks. For balanced datasets, the prompt was modified to ensure at least one utterance expressed a specific emotion, iterating through all target labels.

The utility of these LLM-generated datasets was evaluated using three popular ERC classifier architectures: CoMPM, EmoOne-RoBERTa, and TODKAT. The models were trained on the synthetic data and then fine-tuned on the original benchmark datasets. The results were highly promising: models trained on the LLM-generated datasets consistently showed strong robustness and generalization capabilities. In most cases, they performed comparably to or even better than models trained solely on the original datasets. This demonstrates that even relatively small LLM-generated datasets can significantly enhance ERC classifier performance through a process known as transfer learning.

The study also analyzed how different label distributions affect model performance. Interestingly, the optimal distribution varied by dataset. For MELD, balanced datasets yielded the highest scores, suggesting a need for more evenly distributed labels. For IEMOCAP, natural datasets performed best, indicating that real-world emotion distributions are more beneficial for this benchmark. EmoryNLP remained a challenging dataset, with no clear advantage observed between balanced and natural datasets. These findings highlight that the impact of label distribution is dataset-dependent, emphasizing the importance of tailoring synthetic data generation strategies.

Also Read:

Further statistical validation using the Friedman rank sum test confirmed the significance of these findings, particularly for CoMPM and EmoOne-RoBERTa, where performance improvements were unlikely due to random chance. This research provides a reproducible, affordable, and computationally efficient method for creating high-quality ERC datasets, addressing long-standing challenges in the field. The methodology is also flexible enough to be adapted for other Natural Language Processing tasks. For more technical details, you can refer to the full research paper available here.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -