spot_img
HomeResearch & DevelopmentUnpacking AI's Decisions: Can Large Language Models Explain Themselves?

Unpacking AI’s Decisions: Can Large Language Models Explain Themselves?

TLDR: A study investigates if LLM-generated explanations can improve model classification performance. It finds that these automated explanations are competitive with human ones and consistently boost the performance of pre-trained language models (PLMs). However, for other LLMs, the impact is mixed; explanations can sometimes hinder performance, suggesting that the type of explanation and the model’s internal reasoning play a crucial role.

Large Language Models (LLMs) and other powerful language models have become incredibly adept at various tasks, but their inner workings often remain a mystery. This “black-box” nature makes it hard to understand why they make certain predictions, which is a significant hurdle, especially in sensitive applications. Traditionally, to make these models more transparent, researchers rely on human-written explanations. However, gathering these human explanations is a time-consuming, expensive, and labor-intensive process, making it difficult to scale up.

A recent study by researchers at the Technical University of Munich explores an innovative solution: using LLMs themselves to generate these crucial textual explanations. The paper, titled “Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study,” investigates whether these automatically generated explanations can not only match the quality of human-written ones but also improve the performance of other language models on classification tasks. You can read the full research paper here.

Automating Explanation Generation

The researchers developed an automated framework that leverages several state-of-the-art LLMs to create high-quality textual explanations. They used a diverse set of models, including GPT-4o mini, Mixtral-7B, Gemma2-9B, and Llama3-70B, to generate explanations for two different Natural Language Inference (NLI) datasets: e-SNLI and HealthFC. NLI is a fundamental task in natural language processing where models determine the logical relationship between two pieces of text (a premise and a hypothesis).

Explanations were generated in two settings: “zero-shot,” where the LLMs received no examples, and “few-shot,” where they were given a few human-written examples to guide their generation. A key instruction given to the LLMs was to avoid hinting at the correct classification label in their explanations, ensuring the explanations were genuinely about reasoning rather than just repeating the answer.

Evaluating the Quality of LLM-Generated Explanations

To assess the quality of these LLM-generated explanations, the study employed a comprehensive suite of Natural Language Generation (NLG) metrics. These included traditional measures like BLEU and ROUGE, which look at word overlap, and more modern semantic metrics like BERTScore and MAUVE. They also used an “LLM-as-judge” framework called G-Eval, where another LLM (GPT-3.5-turbo in this case) evaluated the human-likeness, clarity, coherence, and structure of the generated explanations. The findings showed that GPT-4o mini generally produced the highest quality explanations for e-SNLI, while Llama3-70B performed best on HealthFC. Interestingly, the study found that providing a few examples (few-shot setting) only marginally improved the explanation quality, and larger model size didn’t always guarantee better explanations.

Impact on Model Performance: PLMs vs. LLMs

The core of the study investigated how incorporating these LLM-generated explanations affected the performance of both pre-trained language models (PLMs) like BERT, DeBERTa, RoBERTa, and ModernBERT, and other LLMs (GPT-4o mini, Qwen 2.5, and Llama3.3-70B) on NLI tasks.

For PLMs, the results were largely positive: both human-generated and LLM-generated explanations consistently improved predictive performance compared to having no explanations. On the HealthFC dataset, LLM-generated explanations even led to better performance than human explanations in some cases. This suggests that PLMs, when fine-tuned with these explanations, can effectively learn and utilize the additional information provided.

However, the impact on classifier LLMs was more nuanced. In most scenarios, providing LLM-generated explanations did not lead to better performance for the LLM classifiers compared to the baseline without explanations. In fact, for the e-SNLI dataset, these explanations often hurt performance. The researchers suggest this might be because the LLMs used as classifiers were not explicitly trained on the explanations and their internal reasoning processes (akin to Chain-of-Thought) might clash with the provided external explanations. Human explanations were generally more beneficial for LLMs, though their effectiveness still varied significantly between datasets and models.

Different Types of Explanations Matter

The study also highlighted that the type of explanation plays a crucial role. The e-SNLI dataset features “logic-based” explanations that clarify the reasoning process (e.g., “The person is standing, therefore they cannot be sitting”). The HealthFC dataset, on the other hand, uses “summary-style” explanations that provide additional context and background knowledge (e.g., “Analyzed studies have found a positive effect of the drug on the illness”). The logic-based explanations seemed to interfere with the LLMs’ inherent reasoning, while the summary-style explanations were more helpful, providing useful context.

Also Read:

Conclusion and Future Directions

This research demonstrates the significant potential of using LLMs to automatically generate textual explanations. These automated rationales can be competitive with human annotations and can notably improve the performance of PLMs. While their impact on other LLMs is more complex and task-dependent, the study opens new avenues for augmenting datasets with explanations and enhancing model classification performance. Future work will explore a wider range of datasets, refine prompt engineering, and incorporate newer evaluation metrics to further validate and expand on these findings.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -