TLDR: PLEX is a novel method for explaining Large Language Model (LLM) text classifications. Unlike LIME and SHAP, which rely on slow, computationally expensive perturbations, PLEX uses a Siamese neural network trained once to directly map contextual word embeddings to importance scores. This makes PLEX dramatically faster and more efficient, while still accurately identifying influential words and showing high agreement with traditional explanation methods across various tasks.
Large Language Models (LLMs) have become incredibly powerful tools for tasks like text classification, excelling at understanding and categorizing text. However, their complex internal workings often make it difficult to understand why they make certain predictions. This lack of transparency can be a major hurdle, especially in sensitive areas like healthcare or finance, where trust and accountability are paramount.
To address this, the field of Explainable AI (XAI) has developed methods to shed light on these “black box” models. Two popular local explanation methods, LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations), work by identifying the most influential words in a sentence that contribute to a model’s prediction. For example, if a model predicts a sentence is about “joy,” LIME or SHAP might highlight words like “happy” or “celebrate” as key contributors.
While effective, LIME and SHAP face a significant challenge: they are computationally intensive. These methods typically generate thousands of slightly altered versions of a sentence (perturbations) and then run the LLM on each altered sentence to see how the prediction changes. This process can be incredibly time-consuming and resource-heavy, especially with the large and complex LLMs we use today.
Introducing PLEX: A Faster, Smarter Way to Explain LLMs
A new approach called PLEX (Perturbation-free Local Explanation) offers a compelling solution to this problem. PLEX is designed to provide local explanations for LLM-based text classification without the need for these expensive perturbations. Instead, it takes a different route.
PLEX works by leveraging the “contextual embeddings” that LLMs naturally generate for words within a sentence. These embeddings are rich numerical representations that capture the meaning of words based on their surrounding context. PLEX then uses a special type of neural network, inspired by “Siamese networks,” which is trained to directly connect these word embeddings with their importance scores. Think of it as teaching a network to understand how much each word contributes to the overall meaning or classification of a sentence.
The key innovation here is the “one-off training.” Once this Siamese network is trained, it can efficiently generate explanations for any new sentence instantly, without needing to create and process thousands of perturbed versions. This dramatically cuts down on the time and computational power required.
Also Read:
- KGA: Dynamic Knowledge Integration for Large Language Models at Inference Time
- ETTA: Unveiling a New Method to Circumvent LLM Safety Measures Through Embedding Manipulation
Demonstrated Effectiveness and Efficiency
The effectiveness of PLEX was rigorously tested across four different text classification tasks: sentiment analysis, fake news detection, COVID-19 fake news detection, and depression prediction. The results were impressive: PLEX showed over 92% agreement with the explanations provided by LIME and SHAP. This means that PLEX largely identifies the same influential words as these established methods.
A “stress test” further validated PLEX’s accuracy. This test involved removing the words identified as most important by each explanation method and observing how much the classification accuracy dropped. PLEX caused a similar decline in accuracy as LIME and SHAP, confirming its ability to accurately pinpoint truly influential words. In some cases, PLEX even showed superior performance in capturing the impact of key features.
Where PLEX truly shines is in its computational efficiency. It accelerates the explanation process by two orders of magnitude in time and four orders of magnitude in computational overhead compared to LIME and SHAP. For instance, explaining a long sentence with a complex LLM might take tens of seconds or even minutes with traditional methods, but PLEX can do it in a few seconds. This makes PLEX a highly practical solution for real-time applications and environments with limited computing resources.
This research offers a promising path forward for making powerful LLMs more transparent and trustworthy, without sacrificing performance or incurring prohibitive costs. For more technical details, you can refer to the full research paper here.


