TLDR: A new method called KAMIR (Knowledge Analysis via Model Internal Representations) analyzes how familiar an LLM is with input data by examining its internal processing states, without relying on prompt engineering. Experiments show that fine-tuning LLMs with data they are “unfamiliar” with generally leads to better generalization performance, particularly for tasks with concise answers like reading comprehension and multiple-choice QA, by promoting stable convergence, increased prediction uncertainty, and active parameter exploration.
Large Language Models (LLMs) have made incredible strides, largely thanks to processes like pretraining, supervised fine-tuning (SFT), and alignment tuning. Among these, SFT is crucial for tailoring a model’s general knowledge into specific, structured responses. However, a significant challenge remains: how to effectively select the best training data for SFT. Simply adding more data doesn’t always improve performance, and the processes of preparing, sampling, and validating data can be very time-consuming and costly.
Existing data selection methods often rely on analyzing a model’s responses, but these frequently depend on “prompt engineering.” This means they can be sensitive to small changes in how questions are asked and can add extra costs for designing prompts. To overcome these limitations, a new approach called Knowledge Analysis via Model Internal Representations (KAMIR) has been proposed.
What is KAMIR?
KAMIR offers a novel way to analyze data by looking at what’s happening inside the model itself – its “internal representations.” Instead of relying on external prompts, KAMIR assesses data by calculating similarities between the hidden states (or internal processing stages) of each layer within the model and its final hidden state for a given input. This allows researchers to understand how familiar the model is with the input data.
One of KAMIR’s key advantages is its versatility. Unlike previous methods often limited to multiple-choice questions, KAMIR can be applied to a wide array of tasks, including machine reading comprehension and summarization. It can identify data that is useful for training based on the model’s familiarity, even with smaller datasets and simpler classification systems.
How KAMIR Works
The process begins by feeding input data into the LLM without any extra task descriptions. As the model processes this input through its various layers, KAMIR collects the “hidden states” from each layer, specifically focusing on the final token’s representation. It then calculates the similarity (using cosine similarity) between these intermediate hidden states and the final hidden state. This collection of similarity scores forms what is called the “awareness vector” for that input.
Based on these awareness vectors, a simple classifier is trained. This classifier learns to distinguish between “familiar” data (information the model was likely trained on, like well-known events before its release) and “unfamiliar” data (information it was unlikely to have learned, such as new events or papers published after its release). While it’s hard to find completely “unlearned” data, the focus is on data less inferable from prior knowledge.
Experimental Findings: The Power of Unfamiliar Data
Experiments were conducted using a pretrained model (Qwen3-4B-Base) and fine-tuning with familiar, unfamiliar, and randomly sampled data across various tasks like SQuAD (reading comprehension), TriviaQA (general QA), MedQA (medical QA), and XLSum/CNN/DailyMail (summarization).
The results were quite insightful: training models with unfamiliar data consistently led to better generalization performance across most datasets, outperforming models trained with familiar or randomly sampled data. For tasks like machine reading comprehension and multiple-choice question answering, the unfamiliar trained models showed significant improvements. This suggests that unfamiliar data provides richer contexts and more diverse question types, enhancing the model’s ability to understand and locate answers.
This improvement was attributed to several factors observed during training: unfamiliar data led to stable convergence (reduced loss), increased prediction uncertainty (higher entropy, meaning the model formed more generalized probability distributions rather than being overly confident), and more active exploration of the parameter space (higher gradient norms).
However, the impact varied for summarization tasks. For abstractive summarization (like XLSum), the quality difference between familiar and unfamiliar trained models was marginal. For extractive summarization (like CNN/DailyMail), unfamiliar training sometimes led to higher loss and inferior results. This is because unfamiliar training might encourage greater output diversity, which can diverge from the specific reference summaries in extractive tasks.
Also Read:
- Enhancing Language Model Reasoning with Dynamic Confidence Assessment
- Unpacking LLM Effectiveness: Fine-Tuning Outperforms In-Context Learning for Misinformation Detection
Conclusion
KAMIR offers a robust, prompt-independent method for analyzing intrinsic knowledge in LLMs by examining their internal representations. The study demonstrates that strategically training LLMs with data they are “less familiar” with can significantly boost their generalization performance, especially for tasks requiring precise, concise answers. This research provides a new perspective on selecting training data and utilizing intrinsic knowledge to make LLM training more efficient and effective. You can read the full research paper here.


