Unveiling the Geometric Structures of Knowledge in Large Language Models

TLDR: This research introduces Supervised Multi-Dimensional Scaling (SMDS), a new method to automatically discover how Large Language Models (LLMs) organize concepts into structured “feature manifolds.” The study, using temporal reasoning as a case, found that these manifolds (e.g., circles for dates, lines for durations) are intuitive, consistent across models, dynamically adapt to tasks, and are essential for LLM reasoning. SMDS offers a quantitative way to identify and compare these internal geometric representations, suggesting LLMs employ an “entity-based reasoning pipeline” for processing structured information.

Large Language Models (LLMs) are incredibly powerful, but understanding how they process and represent information internally remains a significant challenge. A new research paper, titled “Shape Happens: Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling,” introduces a novel method to shed light on these hidden mechanisms.

The core idea behind this research is the “linear representation hypothesis,” which suggests that LLMs encode concepts as specific directions or structures, known as feature manifolds, within their complex internal spaces. Previous attempts to uncover these structures often faced limitations, such as a lack of generalization or reliance on fixed assumptions about the data’s geometry. This new work addresses these issues by introducing Supervised Multi-Dimensional Scaling (SMDS).

What is SMDS?

SMDS is a model-agnostic dimensionality reduction technique that extends traditional Multi-Dimensional Scaling by incorporating supervision. Essentially, it allows researchers to define a desired geometric shape (like a circle, a line, or clusters) based on the labels of the data. SMDS then finds a way to project the high-dimensional internal representations of the LLM onto a low-dimensional space that best matches this predefined geometry. This approach transforms the problem of discovering these hidden structures into a more manageable “model selection” problem, where different geometric assumptions can be quantitatively compared.

Key Discoveries

The researchers, Federico Tiblias, Irina Bigoulaeva, Jingcheng Niu, Simone Balloccu, and Iryna Gurevych, applied SMDS to temporal reasoning tasks as a primary case study, revealing several fascinating insights:

Intuitive Structures Across Models: SMDS consistently found that temporal entities (like dates, durations, or historical events) form feature manifolds with intuitive structures. For instance, dates often form circular patterns, while durations might align along logarithmic lines, reflecting how LLMs compress temporal magnitudes. These patterns were stable across different model families and sizes, suggesting a universal way LLMs encode this type of knowledge.
Dynamic Adaptation to Tasks: The study showed that these feature manifolds are not static. They dynamically adjust and reshape in response to the specific task or prompt given to the LLM. For example, a model might represent dates in a circular fashion for a general date task, but then map them to linearly separable clusters when asked to classify them by season or temperature. This indicates that LLMs actively transform their internal representations to suit the reasoning required.
Active Role in Reasoning: Perhaps the most crucial finding is that these feature manifolds are not just passive representations; they actively support the LLM’s reasoning process. The researchers demonstrated this by introducing noise into these specific manifold-aligned subspaces. Even small perturbations significantly impaired the model’s reasoning performance, while similar noise in random subspaces had little effect. Furthermore, the quality of these manifolds directly correlated with the model’s accuracy on downstream tasks, especially in higher-performing LLMs.

Beyond temporal reasoning, the researchers also successfully applied SMDS to other domains, such as geographic knowledge, where it uncovered spherical manifolds for city locations, aligning with the true geometry of the underlying domain. This demonstrates the versatility of the SMDS method for exploring various types of structured features.

Also Read:

Implications for Understanding LLMs

These findings provide compelling evidence that LLMs don’t just store information as isolated facts but organize it into coherent, structured representations that are crucial for their reasoning abilities. The concept of “feature binding”—where information is transferred and transformed across different parts of a sentence or reasoning process—is reinforced by the observation that entire feature manifolds are preserved and propagated. This work opens new avenues for understanding how LLMs think and could lead to improvements in model design, control, and even the diagnosis of biases. For more technical details, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Unveiling the Geometric Structures of Knowledge in Large Language Models

What is SMDS?

Key Discoveries

Implications for Understanding LLMs

Gen AI News and Updates

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Microsoft Research Unveils Project Gecko to Advance Equitable Multilingual AI for Global Communities

A New Way to Disentangle Data for Scientific Exploration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates