TLDR: A new research paper reveals that large language models (LLMs) decode relational facts not as isolated pieces of information, but by extracting recurring, coarse-grained semantic properties. The study demonstrates that linear operators responsible for this decoding can be highly compressed using order-3 tensor networks, significantly reducing model size without losing accuracy. While generalization to new relations is limited in general language, the approach shows robust performance in structured domains like arithmetic, suggesting a property-centric organization of knowledge within LLMs.
Large Language Models (LLMs) have become incredibly powerful, but understanding how they store and process factual knowledge remains a significant challenge. A recent research paper titled “The Structure of Relation Decoding Linear Operators in Large Language Models” by Miranda Anna Christ, Adrián Csiszárik, Gergely Becsó, and Dániel Varga delves into this very question, investigating the underlying structure of how LLMs decode specific relational facts.
The paper builds upon previous work by Hernandez et al. (2023), which introduced the concept of Linear Relational Embedding (LRE) matrices. These are essentially linear operators that can effectively approximate the transformation from a subject’s embedding to an object’s tokens for a given relation. For instance, in the triplet (“Paris”, “capital city of”, “France”), an LRE matrix would map the embedding of “Paris” to the tokens representing “France”.
Compressing Relational Knowledge
One of the paper’s key findings is the remarkable compressibility of these relation decoding operators. Imagine having hundreds of these matrices, each with millions of parameters. Stacking them together would result in billions of parameters. The researchers demonstrate that an entire collection of these linear decoder functions can be highly compressed into a single, compact model using a technique called order-3 tensor networks. This compression is achieved without a significant loss in decoding accuracy. For example, in experiments with GPT-J, tensor networks with less than a million parameters significantly outperformed baselines that used many more parameters.
The researchers explored two main tensor network architectures: the SimpleOrder3Network and the TriangleTensorNetwork. These networks use core tensors and projection matrices to represent the relations in a much more compact form. This compact representation not only reduces redundancy but also hints at a deeper, shared semantic structure across different relations.
Property-Based Decoding, Not Relation-Specific
To understand why such high compression is possible, the authors developed a novel “cross-evaluation protocol.” This involves applying a linear decoder trained for one relation (e.g., “characteristic gender”) to the subjects of another semantically related relation (e.g., “university degree gender”). Their results revealed a surprising insight: these linear maps do not encode distinct, fine-grained relations. Instead, they extract recurring, coarse-grained semantic properties.
For example, the decoder for “country of capital city” might also perform reasonably well when applied to “food from country” because both relations share the underlying “country-of-X” property. This property-centric structure clarifies both the operators’ compressibility and explains why they generalize only to new relations that are semantically close. The findings suggest that linear relational decoding in transformer language models is primarily property-based, rather than being specific to individual relations.
Generalization Capabilities
The study also investigated whether these compressed tensor networks could generalize to entirely new, unseen relations. While generalization was limited on diverse, general language data, the models showed robust performance on a specially constructed “mathematics dataset” consisting of arithmetic relations like “number plus X” and “number minus X.” In this controlled environment, the tensor network model achieved near-perfect accuracy on both training and test sets, demonstrating its ability to learn and generalize underlying mathematical properties.
Ablation studies further confirmed the importance of meaningful embeddings for relations, subjects, and objects. Randomizing these embeddings drastically reduced performance, indicating that the tensor network relies on the semantic information encoded within them.
Also Read:
- The Hidden Geometry of AI’s Memory: A New Understanding
- Decoding How Large Language Models Filter Information
Implications and Future Directions
This research offers significant implications for the future of LLMs. By compressing hundreds of relation decoders into small tensor-network modules, it drastically reduces parameter counts, leading to lower memory and compute requirements. This could broaden access to advanced AI in resource-constrained settings. Furthermore, by exposing coarse-grained properties, the work enhances the interpretability of LLMs, making it easier for practitioners to understand, debug, and refine factual knowledge in deployed systems.
However, the authors also acknowledge limitations, noting that their study relies on linear approximations from relatively smaller LLMs. How these findings scale to larger, more complex models remains an open question. The true space of human knowledge is vast, and this work represents a partial exploration of relational structure. Future work could explore broader generalization and the integration of these findings with techniques like LoRA and mixture-of-experts models.
For more detailed information, you can read the full research paper here.


