TLDR: This research paper, “From Prediction to Understanding: Will AI Foundation Models Transform Brain Science?” by Thomas Serre and Ellie Pavlick, explores the dual potential and limitations of AI foundation models in advancing brain science. While these models demonstrate remarkable predictive accuracy in areas like neural activity and human decision-making, the authors argue that such predictive success does not equate to scientific understanding of underlying biological mechanisms. The paper highlights the need to move beyond mere prediction to mechanistic explanation, discussing how self-supervised learning and the pretrain-finetune paradigm work. It critically examines instances where models exploit statistical regularities rather than genuine causal processes, drawing parallels to historical scientific models. However, it also offers optimism through the burgeoning field of mechanistic interpretability, which aims to uncover the internal computational structures of AI models, thereby generating testable hypotheses for neuroscience. The authors conclude that integrating foundation models with established brain science theories and prioritizing mechanistic understanding is essential for their true scientific contribution.
Artificial intelligence, particularly through the advent of foundation models, is rapidly expanding its influence beyond traditional language tasks and into complex scientific domains like brain science. These powerful AI systems, exemplified by models like ChatGPT, learn from vast, unstructured datasets without direct human supervision, enabling them to adapt to a wide array of tasks. While their predictive accuracy is impressive, a critical question arises: can these models truly help us understand the intricate mechanisms of the brain, or do they merely predict outcomes?
The core idea behind these models is self-supervised learning (SSL), where the AI learns by predicting missing parts of its own input. Generative pretraining (the ‘GPT’ in ChatGPT) is a common form, training models to predict the next element in a sequence. This ‘pretrain-finetune’ approach allows models to acquire broad knowledge, which is then adapted for specific applications. This paradigm extends beyond language to vision, audio, and even neural and behavioral data, using a shared concept of ‘tokens’ to represent different data types.
Recent breakthroughs highlight the potential of foundation models in neuroscience. For instance, a neural foundation model trained on calcium imaging data from the mouse visual cortex can predict neural responses with high accuracy across different stimuli and individual animals. Similarly, Centaur, a behavioral foundation model, predicts human decision-making across numerous psychology experiments, often outperforming classical cognitive models. These models offer the exciting prospect of creating ‘digital twins’ – generative foundation models that could simulate neural or behavioral time series indistinguishable from real biological data, opening doors for in silico experimentation and personalized medicine.
However, the paper emphasizes a crucial distinction: prediction is not explanation. Just as Ptolemy’s epicycles accurately predicted planetary motion without reflecting the true celestial mechanics, AI models can achieve high predictive accuracy by exploiting statistical regularities rather than uncovering genuine causal mechanisms. For example, some studies show that while game-playing models can achieve superhuman performance, it’s debated whether they truly encode the game’s rules or simply rely on pattern matching. Similarly, a foundation model trained on Newtonian mechanics data achieved high predictive accuracy but failed to generalize to related physics tasks, suggesting it hadn’t internalized the underlying physical laws.
The evidence for current neural and behavioral foundation models capturing genuine brain mechanisms remains limited. While the calcium-imaging model learns weight structures partially consistent with anatomy, it doesn’t fully reveal how the cortex computes. Centaur, despite its predictive power, sometimes diverges from human behavior in known psychological experiments and can rely on statistical regularities incompatible with human decision-making. These examples underscore that predictive alignment alone is insufficient for mechanistic explanation; we risk replacing one ‘black box’ (the brain) with another (a deep neural network).
To move towards mechanistic understanding, the emerging field of mechanistic interpretability seeks to reveal the computational structure within these AI models. Researchers are mapping functional subcircuits, analyzing attention weights, and identifying specialized components that perform specific computations. This work has uncovered ‘concept cells’ in AI models, similar to ‘grandmother cells’ in human neuroscience, and shown that semantic relationships can be encoded as consistent geometric patterns, leading to testable hypotheses for neuroscience. For more details, you can read the full paper here.
Also Read:
- Bridging Biology and AI: Lateral Connections Enhance Convolutional Neural Networks
- Decoding LLM Learning: A Network Perspective on Transformer Dynamics
While AI scaling laws drive predictive gains, biological systems are shaped by evolutionary and developmental constraints. This asymmetry means AI models may not naturally converge on the path-dependent mechanisms of biology. However, by embedding biologically grounded architectures and focusing on interpretability, foundation models can still yield theoretical insights and generate testable hypotheses, fostering a crucial feedback loop between theory and experiment. Ultimately, the scientific value of foundation models in neuroscience depends on our ability to transform them from data-fitting machines into theory-bearing scientific instruments that reveal not just what intelligence can accomplish, but how it works.


