Decoding AI Learning: A New Approach to Understanding Machine Intelligence

TLDR: This research paper explores a new approach to measure and understand Artificial Intelligence by studying its ability to infer and execute hidden rules in a puzzle-like environment called the Game Of Hidden Rules (GOHR). Using a Transformer-based Reinforcement Learning agent, the study compares Feature-Centric (FC) and Object-Centric (OC) state representations, finding that OC models exhibit better generalization and a more consistent understanding of rule difficulty. The work aims to lay the groundwork for a ‘metrology’ of AI, akin to how psychology studies human intelligence, by analyzing learning efficiency, transfer effects, and generalization capabilities.

As Artificial Intelligence continues to advance at an unprecedented pace, its impact on society is becoming as profound as the invention of the steam engine centuries ago. Yet, much like the early days of thermodynamics for steam engines, we lack a fundamental science to truly understand how AI works, its limits, and its capabilities. A recent research paper, “Toward a Metrology for Artificial Intelligence: Hidden-Rule Environments and Reinforcement Learning”, delves into this critical challenge, proposing a novel approach to measure and comprehend machine intelligence.

Borrowing from Psychology to Understand AI

The authors suggest that the field of psychology offers a powerful lens through which to study AI. For centuries, psychologists have developed methods to understand human intelligence by observing behavior in controlled environments, using instruments that probe the mind’s responses to specific stimuli and challenges. This paper advocates for extending these methods to machine intelligence, focusing on tasks that require an AI to “discover a hidden rule” through trial and error, much like humans solve puzzles.

The Game of Hidden Rules (GOHR)

To put this idea into practice, the researchers utilize a custom-built environment called the Game Of Hidden Rules (GOHR). Imagine a 6×6 board with various pieces of different shapes and colors. The AI agent’s goal is to clear the board by placing these pieces into four designated buckets, but here’s the catch: the rules governing which piece goes into which bucket are hidden. The AI must infer these rules while simultaneously learning the best strategy to play, all based on partial observations and feedback from its actions.

How AI Learns in GOHR

The AI in this study employs a sophisticated technique called Reinforcement Learning (RL), specifically a Transformer-based Advantage Actor-Critic (A2C) algorithm. This allows the agent to learn optimal actions through experience. A key aspect of their methodology involves two distinct ways the AI perceives the game board:

Feature-Centric (FC) Representation: This approach focuses on global board features, essentially encoding where specific shapes or colors are located on the grid. It’s like the AI sees a map of features.
Object-Centric (OC) Representation: In contrast, this method represents each individual game piece as an object with its own features (color, shape, position). It’s like the AI sees a list of distinct items.

The reward system is straightforward: a successful move earns zero points, while an invalid move incurs a penalty of minus one.

Unpacking Rule Difficulty and Learning

The researchers conducted extensive experiments, training AI models on 18 different hidden rules. They analyzed how difficult different types of rule properties were for each representation strategy:

For the Feature-Centric (FC) model: Rules based on piece positions (like placing pieces in specific quadrants or based on proximity) were the easiest to learn. Rules depending on individual piece features (like color or shape) were moderately difficult, while abstract concepts like “bucket ordering” or “feature ordering” proved much more challenging.
For the Object-Centric (OC) model: This model showed more consistent performance across different rule types. Rules based on piece features (color, shape) and quadrant mapping were easiest. Bucket ordering was slightly harder, and positional rules (like reading order or proximity) were more challenging than for the FC model. Similar to FC, feature ordering and conditional rules remained the most difficult.

A crucial observation was that the OC model generally exhibited smaller differences in learning difficulty between various rule properties, suggesting a more robust and adaptable understanding.

Learning from Experience: Transfer Effects

The study also explored “transfer effects” – how learning one rule influences the ability to learn another. They found that if an AI is fully pretrained on the individual components of a complex rule (e.g., learning rule A and rule B before learning A+B), it learns the combined rule significantly faster. However, partial pretraining or introducing unrelated rules often led to slower or degraded learning, highlighting the importance of structured learning pathways.

Beyond Training: Generalization

To test true understanding, the models were trained on a restricted part of the board (a checkerboard pattern) and then tested on the entire board. The results were telling:

FC models: Showed high error rates (50-75%, sometimes up to 99-100% for positional rules) when encountering unseen positions. This suggests FC models tend to memorize specific positional patterns rather than abstracting the underlying rules.
OC models: Demonstrated much stronger generalization, with lower error rates (around 49-55%, up to 65% for positional rules). This indicates that OC models are better at associating behavior with object-level features, allowing them to apply learned rules to new situations.

Also Read:

Towards a Science of AI Intelligence

The research concludes that it may be possible to define a “difficulty space” for AI tasks, where the proximity of tasks reflects their relative difficulty for artificial intelligence. They also note that different AI systems, and even different ways of measuring difficulty, might yield different relationships within this space. The authors propose a future where computer scientists and psychologists collaborate to build the foundations of a new science, potentially called “Cognodynamics,” to rigorously describe and measure machine intelligence.