TLDR: KnowProb is a novel method that uses FrameNet to construct a knowledge graph and then probes black-box language models (PLMs) to assess their understanding of implicit knowledge beyond the given text. The research reveals that both small and large PLMs struggle with this hidden knowledge, often performing worse than random without fine-tuning. While fine-tuning on surface-level tasks improves their performance, explicitly integrating frame-based knowledge significantly enhances their comprehension. Even advanced Large Language Models (LLMs) demonstrate limitations in capturing this deeper knowledge compared to human understanding, highlighting opportunities for future improvements in AI reasoning.
Pre-trained Language Models (PLMs) have become incredibly powerful, revolutionizing tasks in natural language processing and computer vision. These models, trained on vast amounts of unlabeled data, exhibit impressive reasoning skills. However, their inner workings often remain a mystery, leading to what is known as the ‘black-box’ problem. This lack of transparency raises concerns about their trustworthiness and limits our understanding of how they arrive at their conclusions.
A new research paper introduces a novel approach called KnowProb, designed to shed light on these black-box models. KnowProb focuses on a ‘post-hoc explanation’ perspective, meaning it aims to explain the model’s behavior after it has already been trained. The core idea is to probe whether these PLMs understand implicit knowledge that goes beyond the surface-level content of a given text.
The Challenge of Hidden Knowledge
Traditional methods for understanding PLMs, often called ‘knowledge probing,’ typically involve testing them on factual knowledge like syntax or linguistic patterns. While these methods have shown some success, they often miss a deeper level of comprehension. For instance, understanding a simple sentence like “Tom’s grandma was reading a new book when she dropped her glasses” involves more than just recognizing the words. It requires understanding the underlying scenario: that there’s a ‘reader’ (grandma) and a ‘text’ (book) from which a ‘message’ is received. This deeper, hidden knowledge is what KnowProb aims to uncover.
Furthermore, the rapid progress in PLMs, including both small-scale models like BERTs and large-scale models like GPTs, comes with potential pitfalls such as data biases and evaluation errors. It’s often hard to tell if a model is genuinely reasoning or simply recalling information it saw during training.
Introducing KnowProb: A Knowledge-Guided Probing Approach
To address these challenges, KnowProb integrates concepts from FrameNet, a linguistic database that describes how people understand situations. FrameNet organizes knowledge into ‘frames’ (representing standard situations, like ‘Reading_activity’) and ‘frame elements’ (semantic roles within that situation, like ‘Reader’ or ‘Text’).
KnowProb works in several key steps:
1. Frame Semantic Parser: It first uses a specialized parser to identify semantic frames and their associated frame elements within a given text. This helps in modeling the underlying concepts and scenarios.
2. Frame-based Knowledge Graph: Based on the extracted frames and frame elements, KnowProb constructs a knowledge graph. This graph represents the relationships between different frames and their elements, effectively capturing the hidden knowledge that isn’t explicitly stated in the text.
3. Knowledge Probing: The implicit knowledge from this graph is then converted into multiple-choice question-answer pairs. For example, from the sentence about Tom’s grandma, KnowProb might generate a question like: “In the Reading_activity scenario, Tom’s grandma is a [mask]. A. Traveler. B. Reader. C. Hearer.” These questions are then used to probe the black-box PLMs.
The researchers identified six different types of hidden knowledge to probe, ranging from understanding internal frame elements to reasoning about relationships between different frames.
Also Read:
- Integrating Knowledge Graphs for Advanced Multi-hop Question Answering in Language Models
- Evaluating Language Models on Logical Reasoning: The Challenge of Natural Language Satisfiability
Key Findings and Implications
The experiments conducted with KnowProb revealed several important insights:
- Limited Hidden Knowledge Capture: Without specific fine-tuning, both small-scale (like BERT) and large-scale (like GPT-3.5-turbo, LLaMa) PLMs struggle significantly to capture hidden knowledge, often performing worse than random guessing. This suggests they primarily learn surface-level representations.
- Fine-tuning Helps, But Not Fully: When PLMs are fine-tuned on standard question-answering tasks, their ability to capture hidden knowledge improves. This indicates that even training on surface-level text can help models learn some deeper representations, particularly in establishing connections within the same knowledge hierarchy.
- Knowledge Enhancement is Powerful: A significant finding was that explicitly training models with the frame-based hidden knowledge generated by KnowProb dramatically improved their performance in understanding this implicit information. This also led to competitive results on surface-level QA tasks, highlighting the value of this frame-based knowledge for enhancing model learning.
- LLMs Still Have Limitations: Even the most advanced large language models, despite their emergent capabilities, showed limitations in capturing hidden knowledge compared to human performance. While some LLMs like ERNIE-4.0-8K performed better than others, none reached human-level understanding (which averaged over 92% in the study). This indicates that even powerful LLMs still face challenges in deep, out-of-domain reasoning.
In conclusion, KnowProb offers an effective and explainable way to identify the limitations of existing black-box language models. It demonstrates that while PLMs are adept at learning from vast amounts of data, they still face significant hurdles in truly understanding the implicit, hidden knowledge that humans effortlessly grasp. This research opens new avenues for improving the reasoning capabilities of future language models. You can read the full research paper here.


